— 


eae é A x 3 ¥ D nia Se a sa o. 
Hit = ? rh a > . ro oF : 7 
eee ce ee ee ae ne el A a ttt 


7 
| 
| 
{ 
| 
{ 








® Real time Image Pro 





and find out more about it. 


A/d 
Converter 


NCR 450672 
GAPP image 
Processing 
Subsystem 


Data Out ¢ € 
Pere re 2S eS 


C.G. ELECTRONICS SRL 
TELEF O59 S42i€e 
TELEX 226413 CiGivea : 





Seminar Topics 


® Trends in VLSI Design for Multiple Processor devices ® 


SKYLAB srl 

20125 Milano - Italy 

Vie M. Giora 66 - Tel 02/668 38 06 
Telex 335535 SKYLAB | 


ARCHITETTURE SISTOLICHE 
PROGRAMMASBILI 


[cm neg Noo peas | fg: 














HANDUNG REALTIME IMAGES 
COMES NATURALLY TO THE 
MEE SYSTOLIC ARRAY PROCESSOR 


Come to the NCR Seminar on Geometric Arithmetic Parallel Processing (GAPP) 


Iskeletonization @ 





= “a 7 
7G $e 


sae orice oe . 
AGS MORE A HTea 


i 











(RS Sep ape aN Ag I A I a I I a TE LE LT EDI TO ET I I RR A EY 


SD INOY1I 319049 IW 
SID IAUIS SNOI1V9 INNWWO9 3741 

SWALSAS 301440 

SYOSS3IOUd SNOLIVIINAWNOD N3LWO9 

MIASNVYL SONNS JINOYLIITA ’LN3WdINOF ONISSIIOd 93HI 
STVNINUSL ONISSIIONd NOL LIVSNVUL 

SHASN3dSIG G3LVAILLOV GYV) “STWNIWYSL *3VS-4O-LNIOd 
h-Jd ‘A JIVW NOISID30 ?SYaLNdWOd IWNOSU3d 

dX YSMOL ‘ZE9L YIMOL ‘YSLNdWOIINIW 

(14S dIHO Z€/YIN) BOE6 ¢ IWVYINIVWOND IW 

| S3IYSS-A 8998 * IWVUANIVW 





ADO IONHI SL NOVLIVWYOINI YyOS SLINGOYd 
SIFANTdWS 88D he 
JANJAAY WANNV NOITNI9 T° 
SAIYLNNOD BZ2L_NI SALLITIIVA NN ves ON JIANSIS “SAWS 
SJIMLNNOD BL NI SSILITIDVS ONT UNLOVANNVW E2 
>HLIM NOLIVNOdYNI IWNOILVNILIAW V ‘AVCOL 
#88Lt Q3S0NNOI 
NOILVNOdYOD YON OGVYOT0D ‘SNITIOD 104 


SOINOHLOATAOYNOIN 





' (2861 


S31V9 JSOIDI TIS NALSONAL 
SOWN NOWDIW Z°2 + LAS dIHO CE/YIN 





"ONIYNIIVINNVW JIDOT LS lA 
O1 GALOASG SHIdd3LS YIAVM HLIM AYLSNONE SHEL NI ALITWIVI LSU 

"GV4 LUV-SH1-3O-31VIS V HLIM G39V1d54u Y ALITIDVA SONIYdS OGVYO109 ‘2861 

*dS9uV ING 

ALIMVI9V4 SNI71109 LYOS GNV OYNGSIWVIW * YOGNSA YOLINGNOIIWSS 
LIAYVW INVHOYIW V OL YA tIddNS JAI LdVI V WONS DIADNVHI “L861 

- 91907 SOW) 
OdvN¥O102 ‘“SNI1109 LYO4d NI G3N3d0 ALITIODVA GYIHL ‘6161 

91907 GNV WVUG SOWN 
OGVY¥O0102 “SONIYdS OGVYO10D NI GIN3d0 ALIVIIVA GNOISIS °Gl6l 

Ax 1{1V10A-NON SONW 

SY0SS 390 uOW ay aid dh J1901 SOWd 
OIHO “SYNGSIWVIW Ni GIN3dO ALITIOVS ONIYALIVIANVW LSUI4 “{J6l 
NOISIAIGO SI INOYLI 3130N9 IW OOVYO109 ‘SNITIOD 1404 


SOINOYLOATAOUWOIN 








JA 1 LOWOLAV 
Sd3u SAWS ISOS $32 1A 30 
| YATIONLNOD SI THd VS ONISS390Ud 
WNDIS W1I910 
WULNII SYOSSAIONONIIW = =—ss« L100“ AYONAOS 
1sv3_ LSM CE/YIN: *SLINGONd WOLSNIIWAS *SLINGOd 
SUIOVNVW SOW) SOWN 
SJWS ©9354 SOWN +*S34SS390Ud SOW) ‘+S$3SS390Ud 
"d “A ‘I1SSV YIDVNVW IWYANSD YADVNVW YIN AD 
N3lua.0 °& Ni¥Ydd “Y AASSTYY W 
ONT LAAUVW 09 ’*S9UdS *0109 09 ‘SNI7109 1404 


INAGISAY¥d JOIA 
TASSVL NVA “H’P UG 


SHOL LY) 
-ifAgs Tal 
AYOWSW 
311 LV1OA-NON 
Wow 'slondoud 

SOW) 

SOWN 

SONS ‘S3SS390Ud 


USOVNVW IWYINAD 
ZAVVIOH °1 


HO “9YNGSIWVIW 





NOISIAIG SIINOYIII1I0NI IW 


O0¥H0109 ‘SNITIO9 1404 
SOINOY LOI TAIOUOIN 


SIP 
sy 


| j { } 








ON YSN IONS ie aio > ane syMoit anya <——_SIAVO._GIWNOX 
LONGOYd SNOI LVI I IddV LOINGOUd ONT 1 DIVA 
DNISSIIONd WNOIS WLIDID 
NVAITVINS TAVd 

_-ASd $$3904d 

ean aaiG AAG Pea 
ae aNTy LYOddNS 
"Vv'D ONIN LIVINNGW 19nd0ud YITIOYLNOD TINNOSY 4d ONIYSIINIONS 

YISVNVW IWYSINI9 

NOILVZINVOYO LNV1d SIINOYLII1I0NI IW OOVHO109 ‘SNITIOD 1404 


SOINOYLOSITIIOUOIN 





PROFESSOR FORTES' PRESENTATION WILL BE SUPPLIED AS A SEPARATE 
HANDOUT AT THE SEMINAR 








e361 ‘Zl WoAWaLdas NVAL'TINS anvd 



































ra 
SYOSS3DONd AVY 
‘~~ waindWoI¥adns a eee 
Se ee hia 3211S LIG YANNVOS GauyVUANI ‘6 
aA lavinossy uviodia da3ds HOIH rae ce 
SJUNLISLIHDUV uvava = *é 
¥3a1NdwOD dSQ 3WIL-Wau 
JaTwuvd AMVLITIW 
SOAUSS 
SWHLIUONIW dSad 40 $d001 49Vda334 hola ~ 
NOILVINSW31dWI SW31SAS aunssaad - 
dIHD-V-NO-SWALSAS TOULNOD SOTWNY auNLVusaHal - 
S¥OSNZS 
‘JOULNOD SSAD0Ud °9 
IWIL-TVvau 
300) NVULYOS prany/Wasss: °S 
SYOSS3IGUd AVYUY OINSIaS °Y 
| DINOSVHLIN *€ 
dSqd 3INI17-4450 (4LVD ‘o9aa ‘OWH) ‘TIVOIGaAR °Z 
asa cannon 
SdIuNOS WHOIS 
Vivd 40 
NOTIVIAd INVA G3LIVUIXS 3G AVW NOFLVWHOINI WN4AISN LVHL OS 
Q33dS H9lH STWNOIS JWIL SNONNILNOD JO NOTLVINdINYW SHL SI SNISSI9I0Ud IWNSIS 





OGVYO109 'SNITIOD 1u04 
SOINOHLIATIOVOIN 


NOTLINIS30 9NISSII0Ud WNOIS WLISKC | Pal 5 IN. 





j | ; i 


e861 LZ aml | | | | | | NVAITINS ‘Inva 
a 


YVIA 


2661 /86T 7861 8/61 ¢/61 
, got 





SUVIA S pot 
ANJAI JCNLINOVW 
4O W300 3NO 
(SdIW) 
got 33dS 


OT 


por 


SUILNdWODNAdAS 





OdVHO109 ‘SNITIOS 1404 
SIINOYWLIZTZ0NOIN 


-) 
FE] t IN] 
Bete ta eee 


105d4dd5g AGNLINDSVN AO YRACHO SHL 


C86t ‘EL USAWNALdas NVALTINS 1NVd 





YVSA 
O66T S86r O86  SZ6T 0/61 S961 0961 


HOYVISIY 








LNAWdOTZSAI0 
QAaONVAQV 


NOILONGOWd 


wrt “HICIMINTT 


<eN 


x 


NoLondowd Wie 
"SRK Ak 7 











OOVHO1W) 'SNITIOD LHOS 
SOINOYLIATAOUSIW 


S)1 XJ1dWOD HOS UVGA SASU3A HLGTMANTT dOIN 


t 


t. 


ead 


Lb... 


be. ao | 


en ae | 


1... 


ben ca 


[. 7 ‘ere 
js et 


i4 ci 


GATE SPEED, ns 


100 


0. : 
62 6 





INTEGRATED CIRCUIT 
PERFORMANCE EVOLUTION 


4 68 68 70 72 74 76 78 80 82 84 S&S B8 


YEAR 


Se6t ‘2@ ganar NVALITINS INVd 


SSS a a GT a a a a A I I A I TI TTTE, 








UVAA 
O66L O86T OL6T 096T 
7 71 ot 
or 
SUYIA fh 
ANGAa JONLINSWW z0t 
40 Y30N0 INO 
Jind Jars cOT 
JINOULITTA a0 
HOt SUOLSISNVUL 
2 IWVUANIVW OUIIW Ol 
Z¥ sourus g0T 
7 
jot 
KOO ONHOAL YOLINGNODIWS 
OG¥HO109 ‘SNITIOO 1804 
(GSNNILNOO) SOINOH LIF 130UIIN 





190ddd5° AGNLINDVW SAO YWAGHO AHL 


; | } } | j i | 


\ : 1 i \ : | | : \ 


C86 2é ANNE NVAITIONS ‘INnVd 





dIHO/SHOLSISNVYL ,OT <= <OT 


ee AONWEUNONOD _ 
ANISNAAKG SUV SAUIN HONOUHL GAATIHOY G3adS *SYOSS3IONd 
--JaUI FWY SUOUSTSNVUL © 40 SAVY GANT T3dd/13TIWUVd 40 NOLLNIOAI —»OBBT - O86 





SIILIS 
_ “Wud dIHQUSINT ONTLVNIWIVS AQ O3ASTHOV 
ISTA (3adS WALSAS 1N@ JUNLIFLIHIUY NNVWNIN NOA 
SOW NOLLWUD4INT WAISAS  ~-dIHO-W-NO-SWJISAS ISTA SOW 4O NOIINIOAI = S86T - SZET 
Ssq0v 
AMOWIW TWIINSND3S HLIM Midd GaadS HOTH 
T)9--3AISNSINT GaadS = --S3UNLIGLTHOYY NNVWIN NOA 4O NOILNIOAI ©—»«-O86T_- ‘OSET 
JATSNSINT LNSNOdWOD S3GNL WONIVA 000'8T -- DVINI ——-OSGT - 9h6T 





OGVHO109 'SNITI09 1804 
SOINOYLOATAZOYNOIN 


NOILNIOAS SYNLOALIHONVY YaLNdWOO iD /N 


eset “24% ANNES NVAITINS INVd 





WYOMLIN AJONVHOXI-3144nHS YOSS300¥d AVYYV IWNOSVX3SH 


. 
~ 
~ 
“ae 
. 
‘ 


AvuNV OJIIOLSAS d-2 


ane 


OGVHO109 ‘SNITIOO 1LWO4 





J3u4L AUYVNIG 





SanOh toa oulir 


SAYUNLOALIHOUVY YOSSAVOUd—IL INN 


} | | ! ) fo. ) ) } ) j ! 
PRS IS a, 





Z a wtit | | , : AVia iS uivd 





aW) - 
ALIXI1dWOd dWW') - 
LSSH9TH M01 HOH LSAMOTS SUOSSIIOUd JIdILINW © 


ANIHOVW OSWW - 
ANTHOVW JOVNONVT ONIWWVY9ONd TIVNOILINAI ® 


YALNdWOD MOIS VIVd @ 


YOSSIIOUd TATIWUVd JAISSWW - 
AI WITH - 
SYOSSIIOUd AVN ® 


ANIHOWW JSVE VIVO - 
WALSAS FTGVSSINGIV LNFINOD - 
SY0SS490Ud JATLVIDOSSY @& 


SAVUUVY JTIOLSAS - 
WHOASNVYL YALUNOA JLINISIA - 
: NOTIVITTdGI LINN XTMLWW - 
ANON HTH M01 1SALSVA SLINA IWNOTLINNS JSOdYNd WIIIdS Aldi LINW ® 


(3YINOSY = AINATIISSS = ALT WWYSNI9 = 3d 
JUVALAOS JUVMCUVH 








SSYUNLOALIHOUV TaTIVvuVd SoiNowioaT30uOIN 
ATHDIH 4O S3dAL IPD) 


History of Two Dimensional Arrays of Processors(1) 


The SOLOMON (Simultaneous Operation Linked Ordinal MOdular 
Network) computer(2) was a proposal for a two-dimensional SIMD 
array of 32 by 32 processing elements, each with a bit-serial 
arithmetic unit. Each PE had a local memory of 4096 bits. This 
computer was never built in the form described in Slotnick's 1962 
paper, but it gave birth to Illiac IV, the ICL DAP ane the 
Goodyear Aerospace MPP. 


The University of Illinois began the design of a SOLOMON-type 
computer in 1966 which became ILLIAC IV‘3/, One quadrant of the 
*machine was built by Burroughs and delivered to NASA in 1972. It 
consisted of an 8 by 8 array of 64-bit, floating point 
processors, each with 2048 words of 64-bit memory. Based on the 
lessons from Illiac IV, Burroughs developed a commercial design 
(the BSP), consisting of 16 processors and a more elaborate 
memory heirarchy‘4),. 


During the mid to late 1970's, International Computers Limited 
(ICL) developed a Distributed Array Processor (DAP)(5)., The DAP 
consisted of a 64 by 64 array of one~bit processors. Sixteen 
processors, with 4096 bits of memory each, were contained on each 
of 256 boards. 


In 1983, Goodyear Aeraspee delivered a Massively Parallel 
Processor (MPP) to NASA‘6). This machine consisted of a 128 by 
128 array of processor elements, which was constructed from 2048 
CMOS integrated circuits, each of which contained eight, single- 
bit processors. Separate memory devices provide 1024 bits of 
memory for each processor. ; 


Paralleling the development of MPP-type architectures has been 
the development of systolic architectures‘7), In these SIMD 
systems, data is continuously pumped through an array of 
processors. : 


Massively parallel architectures are well suited to VLSI 
technology‘8). ‘The component density on integrated circuits has 
been doubling every two years. It is now more cost effective to 
increase computational power by massive parallelism than through 
the use of faster transistors. 


Also, with higher levels of integration, the interconnect wires 
are becoming more expensive than the transistors. Thus, 
architectures utilizing two dimensional arrays of processors, 
with only local communication to nearest neighbor processors are 
easily designed and manufactured in VLSI. 


From 1982 to 1983, Martin Marietta Aerospace, Orlando, Florida, 
developed a VLSI architecture for the Geometric Arithmetic 
Parallel Processor (GAPP). This VLSI component, containing 72 
processor elements with 128 bits of memory for each PE, has been 
Gesigned and manufactured by NCR Microelectronics, Fort Collins, 
Colorado. The commercial product introduction of the GAPP by NCR 
Microelectronics: in 1984 will have a major impact on the 
architecture of special purpose computers. For the first time, 
massively parallel architectures, can be implemented with low 
cost silicon chips. 


In the future, Wafer Scale Integration (WSI) will push computer 
architectures even further in the direction of parallel 
multiprocessor arrays(9). Research on advanced multiprocessor 
architectures is underway in many universities (10) 


10. 


REFERENCES 


R. W. Hockney and C. R..Jesshope, Parallel Computers 
' (Bristol: Adam Hilger, Ltd.), 1981; L. S. Haynes, et. al., 


Survey of Highly Parallel Computing." Computer Mag. 15, 
Pp. 9-24 (January 1982). 


D. L. Slotnick et. al., "The SOLOMON Computer,” AFIPS Conf. 
Proc. 224, Pp. 97-107 (1962); J. Gregory and R. McReynolds, 
"The SOLOMON Computer,*® IEEE Trans. Electron. Comput. EC-12, 


D. L. Slotnick, "The Fastest Computer,” Scientific American 
224 (2), pp. 76-87 (1971); W. J. Bouknight, et. al., “The 
Illiac IV System," Proc. IEEE 60, pp. 369-379 (April 1972). 


Cc. Jensen, "Taking Another Approach to Supercomputing,” 
Datamation 24 (2), pp. 159-175 (1978). 


S. F. Reddaway, "DAP - A Distributed Array Processor," First 
Annual Symp. on Comput. Architecture (IEEE/ACM), 1973; P. M. 
Flanders, et. al., “Efficient High Speed Computing with the 
Distributed Array Processor," j 

j i (London: Academic Press) pp. 113-128 
(1977). 


K. E&. Batcher, "Design of a Massively Parallel Processor," 
IEEE Trans. Comput. C=29, pp. 836-840 (Sept. 1980). 

H. T. Kung, "Why Systolic Architectures," Computer Mag. 15, 
pp. 37-46 (Jan. 1982); H. T. Kung, "On the Implementation and 
Use of Systolic Array Processors,” Proc. Int'l. Conf. on 
Computer Design: VLSI in Computers, IEEE, pp. 370-373 

(Nov. 1983). 


C. Mead and L. Conway, Introduction to VLSI Svstems (Reading, 


Mass: Addison-Wesley) 1980. 


EH. T. Kung and M. S. Lam, "Wafer-Scale Integration and Two- 
level Pipelined Implementations of Systolic Arrays," J. 
Parallel and Distributed Computing 1, pp. 32-63 (Aug. 1984). 


N. Mokhoff, “Parallism Makes a Strona Bid for Next Generation 
Computers,” Computer Design 23 (10) pp. 104-131 (Sept. 1984). 


COMPUTING STRUCTURES 
FOR IMAGE PROCESS/NG 


Edited by 
M. J.B. DUFF 


Department of Physics and Astronomy, 
University College London, UK 


British Library Cataloguing in Publication Data 
Computing Structures for image processing. 
1. Image processing 


I. Duff, M. J. B. 
621.3819'S98 TA1632 


ISBN 0-12-223340-9 
LCCCN 83-70582 


Copyright © 1983 by 


ACADEMIC PRESS, INC. ACADEMIC PRESS INC. (LONDON) LTD 
(Harcourt Brace Jovanowcn, Pubishers) 
London Orlando San Diego New York 
Toronto Montreal Sydney Tokyo 
ACADEMIC PRESS INC. (LONDON) LTD 
24/28 Oval Road, London NW1 7DX 


United States Ediuon published by 


ACADEMIC PRESS, INC. 
Orlando, Fiorida 32887 





Library of Congress Cataloging in Publication Data 
Preston, Kendall, 1927— 
Modern cellular automata. 


(Advanced applications in pattern recognition) 

Bibliography: p. 

includes indexes. 

1. Cellular automata. |. Duff, M. J. &. i. Title. I. Series. 
OA267.5.C45P74 1984 001.64 84-11672 
ISBN 0-306-41737-5 





Modern 


Celiular Automaia 
Theory and Applications 


Kendall Presion, Jr. 


Camegie-Melion University 
University of Pittsburgh 
Pittsburgh. Pennsyivania 
Kensal Consuiting 

Tuscon, Anzona 


and 


Michael J. 6. Duff 


University Collage London 
London, England 


Plenum Press « New York and London 


€ 1984 Plenum Press, New York 
A Division of Plenum Publishing Corporation 
233 Soring Street. New York, N.Y. 10013 





€86T ‘47% ANNE NVAITINS InWd 
dIHO NO WWu ON SSS NaNO? Waedin} --———————-__- 
(NOILLVLNGNOD IWIUaS Lid) NIV Lia T 
SOND url ¢ 


a €°€ * 6°S aAZIS dIHo 


a; | 


ee 


ie . A ef, oes 
eb, ae te . 
e od v Mihese ony uavhuigsciat 
4 aT Tht Na? guaBne Les FF 
ee ee at : 
a 





{en eres, 


eases vite 
SAavity, 4 {nves6ed 


{aye} ate 
welee? aveay 


: i ; —f 
Heed A i Ay ¢ 1 ‘ seieith = i vessees 
‘ox 4 . ; . : hi d)ese 318) “t 
a 29 es 164. fx Wwormad 
e Sf F : : & . . Wwaadis) miss : | 
xe ai : 7 servis fl Hrewaees 
ze wy 
d1IHO/S.LNGWAT4a teat) 
yOSSIOOUd 8 seg 


t - bos 
en ee ew Bf) 
eee ot pew ee cae 
PS Se eer 
a Ey re 
tern tee Ne ‘a 
: o: 
. ’ re . 7 
ee ap a 
err aries eee . 
odes ae bees warecke 
a Les . ~ ve + 
_ 


aoe 

pega pare areas 
ccm. 
[do ok - 

AY 3 


| 


\ 
« baie - be 
La! = ' ‘ - eae ata al 
rig ae , he 4 
: es e's : 
eee eee 
tox PRE apr hate net 





(e.3¢ eet’ on} 


a | 
ho oe- 








Ig 

perce ee 
_* 

ANS oe! 


= 29¥ 303081 : alan sveev 33ean ais 
Ts ake ta ate tn ate ert 
ma 
ea ae 


ae 
We. 


A 
it 
—a phd 


os 
! nai 
- ' bd . 
es i 
! 
: 


yin 
i 
baie N tes 
8 iA eR PEI 21108 3 


{2 iG pene ie pes ae e | Bilt 
ingle de kA BEES ~ oe, ia 





3Ovdsousv YVSAGOOD/VSYN O0DVHO109 ‘SNITIOD 1HO4 


HOSS300Ud TaTIvvd ATaAissvw RIL 





“(aoedsalay ie3Apooy jo Asaiinod) ddW 22eds 
-O13Y IIAPOO!) DYI Wor JUItUD]s BuIssacoid aZUIs WY €7°9 FNNDIA 


(0) 38q DyDg 


{OE 40°92 '7Z 
‘Wu OZ =H) 








OdVHO109 ‘SNIT1IO9 1404 
SOINOYLOATIZONOIN 





€86l ‘42 ANNE NVAITINS INVd 


ee a Tn eet ae] 


(SYOSSS00Ud THT Wavd JSN TIM) 


G00’ BL | UAINdWOD YIdNS JSINVdwe 40 S1vO9 
"I'd SOND 4O AVUYY 8ZL X 82l 
ga¢ z YOSS3I0Nd TETIWevd 3AISSWW YV3AG009/VSYN 
21 YOSSIIONd AVUUW “WUFHdINad BBZi-dW¥ Sd4 
99 | 1HOWLIH 
905 | | nsLirns 
g08 (NIVadId-) $82-43GA9 
900° | (YOSSIO00Ud 47) Z-AVUD 
Bhl | L-AVUO 
92 (YOSSII0Ud-4) BBE WAI 
a G61/09€ WOl 
———*sd01S WY INNO) 


GNODSS Yad SNOILVYSdO 
INIOd ONILVOTS LIE cE 





SYALNdWOOYSAdNS JO soiNwi2373040IN 
Q4adS NOILVLNdWOO EEN 





It 


S861. €. cul NVAL TINS "wWVvd 





JINSSTTISINI HLNOS 
(S1NGIYLSIG HLIM AYOWSW J9UVI ® 
AMOWSW GILNALYASIG . = 
HLIM SYOSSII0Ud JIdILINW © e 
AYOWSW 40 INNOWY ® 


AWIL SA NOISIDId JdVul © 


NOLIVINdWOD WIY3S 11g ° 
Lsv3 1S3M 


SYOSSIIONd TIdWIS ANVW ® 
SYOSSII0Ud XIIdWOD MAJ ® 


44030VUl = ° 
HLYON 





OQVHO109 ‘SNIT11I09 1804 


HOSSHOOUd AVHYV OIMOLSAS (GNIS) BIBI 
VLVG AIdILINW NOILONHLSNI JTONIS 



















Seeds Ts es 
— nes a Te a et Gaerie ‘1 oo — a: 
aay ba . /_ a. f ea " : 
. . . ae ris * Hejl r : 
ra 
oe: LAist S28 nia es 
= PR: Eee —S 
: evo a =¥, 
ret napeeg eye : 


“gar ~"sJaT 
















= oie 

=) a ae 

= me} ctl 
<— 


i] 


tow 


fs 


iM 
it 











t 
1 


4 
v 

i 

~ 


ZL90GHUON 
IK, 


uy 






UW “ee 





| IN | 
ine 


tH 
‘ 





Nees oe 


UN 





a 
> 
‘| 






ON le | OE fae Renee An 
tee he Tapiaige awe — MT a? Tg ag va 7 at yc hh 7 ‘wal 


v= aii Cope eK ate 7 PRES Wee ete 
gars es 
eee 
fm peered ae 
mae : a 


wm Se EE Ye > 


eF.e Neo on np a 








“go” ‘42 7° Nvi...1NS ...Vd 





HINOS 
dIH)/4d 21S = AVUNW 22 X OT ® 
SOWIML WET 0 
& - ddv¥9 £861 ° 


dIH)/4d ZZ = AWWW CT X 9 ® 
Wid + SOW) WT © 


II-dd¥9 h86T ° 


(sv 1S3M 
dIH)/ad 8T = AW OX ES 8 


SOW) Wh 8 


I-ddV9 286T. ® 


HLYON 





OAVUO109 ‘SNITIOD LHO4 
SOINOHLOSTZOUNIN 


SINSWa1a YOSSZ00ud 4O ALISNaG fet © |N' 





Ore g¥0Z Z66T UOIOTH S°O 
ost Z1S L86T UuoIOTW T 
oT zt S86T SUOIOTH Z 
ve6T BUOIOTW 
“WOW OW dIH) Wid XTIITAWIIVAY AO ITO -ROTSH 


Udd SNId S2NAaWATa YOSSADOUd WAL GALVWILSA SSaD0ud SOND 
aqauindau ddV5 dO UgaanoNn 





OdGVHO109 ‘SNITIOD 1U04 
SOINOHLOATZO¥NNIN 


sialaslenlashenealaia EID 


T66T 066T 686. 8861 L86T 986T SB6T ve6T —-PUWA 


0 
z*0 
aes 
Ps) 
ro SB 
wm eH 
6 
M8] 
g 
a) 
9°0 7 
tt 
dow 
ae] 
8°0 
O°T 








OGVHOT0D 'SNITIOD 1u04 
pele Pree ae bed kedl 


ddV9 YOS GNFYL FIYd 





| _ ‘pasn 

st Butssaippe yIertpuy 19ys}G01 pue ’19,3s]{6e1 a3010das 
@ UT 3dey ST SSaippe aIT1Mm YyORa 4st ( “9asU GggZ) 

(OT ‘ET “ET) SeTOAD GF 02 PeueiIOYs eq ued dooT :ej0N 


( "9aSU QZSE) SaT{aA9 pp >T84O1 | 
( ‘98SU 908) SAT9AD QT d001 ‘oa 4gd 
("98SU Q9ET) SATIAD ZT (Z¥)ZuA JLIMM “+(TV¥) 4 “3A0W 
("9aSU Q9ET) SETIAD ZT (Z¥)ZeXMAXGOW “+(OV) 4 “3A0N 74007 


(OO0890N ZHN S°Z7T @ sewnsse) YOO) WOILIHD IHL 





OGVHO109 ‘SNIT109 1804 
SOINOHLISTAOYNSIN 





ie 


‘poyzew sty} Gutsn payetnunooe sitq op Buti.06 
$O waTqgold ey} pesseippe uene },UBAeY aM 164} 8}0N 


(*98Ssu 9008) S®T949 QOT [TSO] 


( “9asu Q0g) S®T9A9 OT d0O1 ‘od 460 
("938su OP9) seToAo g ZO “TO Gav 
(‘9a8SU 9276S) S8T2AO p7 +(T¥) ‘Td STM 


(-9as8u Q¥9) SeToXo g ta “+(0¥) = M"3AQM :d007 


UOTIINIISUT SW JO NIM 842 BUTS - JATLVNYILW HL 





OdVHO109 ‘SNITI09 1LHO4 
SOINOY LOS TAJOWNOIW 


el. i?) [IN] 
heed PR ae 





“SWALSAS YOSSAOOMdOW IW LIG-9b ONIISIXS YO MIN NI 
SNOT IVY3dO WISNSINI ATI LWW UOJ 
SYOSS DONWON IW dSG YO SYOSSDOWO OL 
AALIWNYSLW WI LOVeLIV NV SINSSIud3e 
JWW 140d FPNIS ILWOSHYON 


U=M0d MOT ’1S00 MOT 3H 





OOVHO109 ‘SNITI0D 1804 
SOINOYLOATAONOIN 








‘DIld V SIDA 2 AWWO JYINOSY SNOIIVYSdO FLV WWNIIV/ATd LWW GANT T3d Id 
(JSVI LSHOM) =WIi JWAI SN B22 


GIG ANNIAS JIWVYSD SOIM TTIW BOE 
did DILSV 1d SOIM TIW 669 DVAIVd Nid 4% 


ALITIGVd¥3 LOVYIGNS/GOV HLIM YOIVWWNDDV 11d Bh 


(ININUND AGGNVIS Wt gz) 
(XVW) INANUN ON IVYsd0 WW Ol SOND YSM0d M01 


Sauna 
| G03 SI 1000 ON - 
sna dt NO WY DIIWIS V D1 .SNOO, 10d JONIS - 
SHOSSTIONdOW IW LIG OL HLIM ATLOINIG S2OVINFINI - 
ND1SS0 140d JIONIS 
YOIV WWNIIW/ NSIT LIT 1Y0d FBNIS 





OdVHO109 'SNIT109 1404 
een ane ee ELA EE dt 





| (“NOI IGWASNOO Y=MOd 
JS-MO1 SHE HLIM “AVOOL JIGVTIVAV DW OLX OL WWS FHI SI watan SHL) "WY 


1SJ] 9 
JIIWIS WV SV FOVdS GYVOd ANVS SHI OINI SLI4 DVAIWd did ANNINS TIW @BE FHL TIM 
‘G08 NO YIMOd WYIX YO4d GIN. SHL SSIVNIWI 13 SOND YMOd MOT ’SW3ISAS ONTISIXS UNOS 
0 


"80089 Q3OIVNA OL G3IWdWOO €< 
JO YOLOVI VAG FIL ATdI LW SONGIY ILWISH SHI “SWALSAS G3SVd @OO899W HLIM G3SN NZHM 


“SSIVIS LIVM ON _HLIM YOSSOONdON IW L1G 9L DIHLITONOW ANY OL 
CSOVINSINE ATLOFNIG 3d NVD ONY SNOLIVYSdO ATI WW YOI SNBZZ ATNO SIMINDIY SLWOGHON JH 
SNZ92 = SADAI 4 °ZHW SL LV 


| ‘(QSONTONI SI GVO 





4 
YOLVYSTDIV NOLIVII WI LWW OSS ene? IW SAInGuG oat acuaie 


: : eee: _ 


“Vem, 








H4IXIIdI LING 






ty 
| 2 Z 

9 | 3 4 

HOLY INWNIIDV 4 oy 
wD 
Oo 

= $3 

IM 






AVYUYHV 


W3ITdILINW Gig -0q 





H3LSID3u-A 
LedOd LNdLNO/LNGN! 





H31SID39Y-X 


WVYSDVIG 49018 IVNOILONNS = 





OGVHO109 ‘SNITIOO 1LHO3 
SOINOY LOA TIOYNSIN 


1 ‘ 
4 
~ 
? 
, - os : 
1 








A pue x ylog Oo} Blep mau alli, 
XK 01 B1Ep Mau allIYy 
A O1 21ep Mau ali 


A pue x utelay “dON 


NOILVH3AdO HINT LING 





(0 = 3M) SNOILVY3d0 JLINM 


‘GE Ng yo uolsuaixa Ubis © ase fy - OY S1IG ayuM ‘pHeA ase GE - Q S1Iq SNYL “31IqQ Op O8 SaIE;NWNIVE JOLEINWINIIY :3ION 





NOILVH3d0 





(kL = 3M) SNONLV¥3d0 GV3u 





OGVHO109 ‘SNIT1IO9 1H04 
SIINOY LOA TAOYUOIN 


oy 








1WS3Y Ya04O HOIH dV3eg L L8od 
LIASAY YANO MOT Gvsy l BOH0 
"IOV OL AeX GOV "4 OA10 
A SLIM COON WID 9 1000 
X JILRM IW WwID. @ 9180 
- NOTDONYISNT 
iMSsy Lid 2€ GVSY “SYSGWAN LIA 9l OML ATI LW 
SNOI IVYSdO FIMWYX 3 ODVHO109 ‘SNITIO9 1404 


SONGS t 208 lI" 





IWSFY JO SLIG INVIISINDIS LSVF1 Gv3e l g000 
1NS3 JO 1€-91 SLIG VRE L 1pa0 
1WSIY JO SLIG INVIISINDIS JSON GV3Y t alae 
€AwEX#Z Ae2X+H Ae LX = “DOW g B16 
€A ALIM g L901 
EX BLIMM ‘ZAwZX#LAwLX = "DOV 6 aLio 
Zh ALM g Loa 
Zh FLIM “LAwLX = '90¥ G TAU, 
LA SRM “"90V wID 6 1900 
IX JLRM “°90¥ WIT 0 g180 
“IF NOT DASISNT 


€A x EX + CA w CX + LA & LX 
= JWS3Y L1G Gh Gvse SINIL € IWSHIY ‘SIVWNODV ONY SUSEWNN LIG St OML ATI LW 





SNOI 1VaSd0 FWWYXI OdVHO109 ‘SNIT109 1404 





3 , te}: 
MICROELECTRONICS 
FORT COLLINS, COLORADO 





MC58000 to 45CM16 INTERFACE 












MICROELECTRONICS 
FORT COLLINS, COLORADO 


NCRMAC 


SUBROUTINE: Sum Products 

Todd Davies . 

* Implements the formula: A = (X1)*(Y1L)+(X2)*(¥2)+...+(Xn)* (Yn) 
* AO points to the first element in the X list. 

* Al points to the first element in the Y list. 

* 00 contains the number of products to be summed. 

“ Result of product sum is returned in D2 and low byte of D3. 


Author: 


EQU 


* Write offsets 


WXYCLARA 
ADDXYWX 
WRITE_Y 


EQU 
EQU 
EQU 


* Read offsets 


A_LOW 
A_MID 
A_HIGH 
START: 


LOOP: 


EQU 
EQU 
EQU 


MOVE. W 
CLR. W 
MOVE. W 
MOVE. W 
DBF 
MOVE. W 
MOVE. W 
MOVE. W 
SWAP 
MOVE. W 
MOVE. W 
RTS 


XXXX Base address for MAC chip. 

$3 Write to both X and Y, clear Acc. 
$6 Add X*Y to Acc.. put new data in X. 
$9 Write new data to Y. 

$0 Low word of Acc. 

$1 Bits 16-31 of Acc. 

$2 Bits 32-47 (40-47 extended). 
#NCRMAC, A2 

WXYCLRA*2(A2) Clear X, Y, and Acc. 


(AQ)+, ADDXYWX*2(A2) Mult. /Acc., write next X 


(Al)+, WRITE_Y°2(A2) Write next Y. 


DO, LOOP 


DO, ADDXYWX"2(A2) Last mult../acc. 
A_LOW"2(A2), Ol Fetch low word in acc. 
A_MID*2(A2), 02 Fetch bits 16-31 in acc. 


D2 
D1, O2 


Swap bytes 
Convert to single 32 bit. 


' A_HIGH*"2(A2), D3 Fetch high acc. word 


iam ee 


} 


aautisig Gunuuiepyy e« 


YIeIS NUN oe 


Asoway ayqessasppy 1ua1u07 e 
YOSS3908d JAILVIDOSSY 


Hubewy jenuaayig e 
Guyranyy jeneds e 


udlssatdiuto7 


UOHNOAUCD jeUOIsSUTUG-Z « 


uoNsa13g apy 
luaWwaIUeYyUR ebewy 


QINISSIOOUd ADVWI 


1205 pue yaveaS oe 


uresboistyy 
suchesndg xuIeyy 
uayInjoAUOZDD « 


ONISSIIOUd VIVG 131d « 


UOISIA sUILIEW 
uosiJadsuy parewoiny e« 
Guiyriey siejdway, « 
J3f14 ByOdS 
w0jsues] j2GOS e 
UOI1EITOD » 
NOLLINDO93Y NHY3LLVd « 


SNOILVOMddV «= 


YOSSAIOUd TATIVYVd DILAWHLIYV OINULAWOSD 


CLOISPYHON 


OGVYOT0S ‘SNITIOD 1404 
SOINOULISTAOVIIN 








a2 


Mi 





FORT COLLINS. COLCRACG 
is 
NCR45CG72 
s PROCESSOR ELEMENT AND DATA BUS’ IDENTIFICATION 
“ TOP VIEW OF PACKAGE - 
3 s gs g 3 8 
< z < = z z 
= — = a = o 
af £5 422 227 82 4? 
A A A A A FY 4 A A A 
i ' { 
wefo le [a |e | |» ft 
Woe | GLOBAL 
a 20 21 22 23 24 25 E2s CONNECTION 
TO EVERY 
Wag Eqs PROCESSOR 
ELEMENT 
Wag 40 41 42 43 | 44 | 45 Sas pad iglesia 
Control Lines 
Weg Ess Co -Ce 
Wer 60 | 61 | 62 63 | 64 | 65 Eas 
Wra 70 | 71 | 72 | 73 | 74 | 75 E75 RAM Adcres 
RAg - RAs 
Wag 80 | 81 | 82 83 | 84 | a5 Eas 
War 30 | 91 | 92 | 93 | 94 | 95 Ess. | 
Wac AO | Al | A2 A3 AS | AS Eas aoa oo “ 
Wag gO | 81 | a2 | B3 | B4 | as oe a 
- A A A A ry A A 
Y toy y Y v i oy 
Qo 94 - = a eo «o r. w (6 
a a [~~] Q a a a a a Qa a = 
2 nw wn Ww o o Ww ew 
5 g S g 3 g 




















FCAT COLLINS, COLORACO 








a 
NCR45CG72 
2 BLOCK DIAGRAM OF CONNECTIONS SETWEEN 
FOUR PROCESSOR ELEMENTS 
Bidirectianai 
Noninverting ; 
1/O Buffer ? 
OFEN 
ORAIN 
GLOBAL 
OUTPUT 
1, 72-input 
{ 
OE = Output Enabie is an internal connection. _ 
East Outputs enabled whenever Cs = tand Cg 1tand C7 =O (EW:=W) 
West Outputs enabled whenever Cx =O and Cg = 1 and C7 = 1 (EW: =€} 
North Outputs enabled whenever C2 #4 and C3 * 1 and C4= 0 (NS: =S) 
South Outputs enabled whenever C2 =Q and C3 = 1 and C4 = 0 (NS:=N) 
GO is suiled low whenever any NS register contains 1 
= Pa 





4 












davies ifs. 
PN TEAS HAS 
1+ J tea road 
a a u 


MICROELECTRONICS 
FORT COLLINS. COLCRADO 





NCA45CG72 
* SCHEMATIC DIAGAAM OF ONE PROCESSOR ELEMENT 


Ava 


i 128 X 1 ot ARAM | 
{ Io’ 4 


] | 
Ag ay Ay dg Ay Ag Ay CLIN © Commenmenen Mere Quieut 





oro orrc. 
ooocerorr-.} 
Orr oceroce- 
QO oocdqe wf -- 
oq@errer COS. 


indino 
sudces9dy 1317e2)}qNS/EppY 


SNOILVHSdO OILSWHLINV = 





O@VHO109 ‘SNITTO9 1HO4 
SOINOULISTIOUSIN 






| ' | 








, rts 
“SN i Tay ¥ NG 
tay DEN Ne 


; | | | 


' 
rrr ms ee _——- — te a a TT ee OTN EEL PSY FT aE a I ES be FTE 5 TL ee ED OS FPL AE TT, ES TLE GE TTT I Pe SE Se a 


2 ® M3 = WS 
M3 @ SN = WS 
2 @ SN =Ws 
D+ Md = M8 
Ma+SN= Me 
M3+SN=A9 


0=9 M3 SN = MG 


0=M3 JeSN=AQI 

0=SN JeM3 = Ad 

O=9 M3eSN = AD 
t= M3 ‘0=SN 
b=9'O0=SN 
t=9'0=M3 

NOILVH3d0 
WIIDOT 


SNOILVYaAdO DIDO) = 








OOVHO109 ‘SNITIOD LYUO4 
SOINOY LOdTIOUSIN 


Paty fOYN! 


cll i) ane 


eae eee 
1 
hel ; 
li. 


MIC 


ROEL 
FORT COLLINS, COLORADO 





CTRONICS 





s INSTRUCTION SET 






C:*¢ 

Cc: *RAM 

C:*NS 
¢ Cc: * EW 

c:* cy 

Cc: = 8W 

¢: #9 

Cc: et 


READ 
RAM: = CM 


RAM: #¢ 





RAM: = SM 


x x xe 


x Mm 


“=a oalx xx x * xX xX KM KK KK KK x x KKM KM] X 





~o~oalxxx x XX x KIM KM KK MME KK KKK KEK 








~e se + O00004xX xX XxX XX KK KK KK K KIM 


x «x KK x 






~~oe0O7- ~Q 0] KK KKM KIK KK KK RK KEK 


x KK 






-~-ao--o0o+7+0-01K * XX M KM KIM MM KK OM KY 


x KM x 


Control Lines 
Ca Co Ca Cr Cg Cg Cy Cz Cz Cr Co 


x MX «KEK KK MK KK K KM 






~a-=7=--=- 000 07% K K KK K KIX 






- oo" = GAGA TMK KX KX K K KX KIX 


MMM MITK MM KM MK 









~o -AO-~ OEX KK MK X KIX 


x KK MK MK KK KK 





~-—- ~~ oO Q0 a[ xX 


x x KX KEK KK KK WK KEK KK KK KM 


» 


~oo--7---00]X x 


x x «x «KEK KKK KK KK EK KK KM KK 


xxx xEM KM KM MK MK KML K MK MK 


x «MxM KIM KKK KKK KR K KK KK KEK KK KK KM 





= 


NCR4SCG72 


Description 


MICRO-NOP 
LOAD CM FROM RAM 


MOVE FROM CMS 
INTO CM 


LOAD OINTO CM 


MICRO-NOP 

LOAD NS FROM AAM 
MOVE FROM N INTO NS 
MOVE FROM S INTO NS 
MOVE FROM EW INTO NS 
MOVE FROM CINTONS 
LOAG GINTO NS 


MICRO-NOP 

LOAG EW FROM RAM 
MOVE FRON E INTO EW 
MOVE FROM W INTO EW 
MOVE FROM NS INTO EW 
MOVE FROM C INTO EW 
LOAD OINTO EW 


MICRO-NOP 

LOAO C FROM RAM 

MOVE FROM NS INTO C 
MOVE FROM Ew INTO C 
LOAD C FROM CARRY 
LOAD ¢ FROM BORROW 
LOAC INTO C 
LOAOTINTOC | 


READ FROM RAM 
LOAD RAM FROM CM 
LOAD RAM FROM C 
LOAD RAM FAOM SUM 





43 





MeaGelec ao uic 
FORT COLLINS. COLCRADO 





NCR45CG72 


TIMING DIAGRAM 





| Vin = 2.0V 
Vir 2 O.8V 


CLOCK 


5 emt scon Hl] aD (2 MK = MU Xe 
- «=f THO ADE-DS- 

7 a. = Pe eas 
ie wen 





NOTE: 1,2,3 refer to the staging sequence of instruction, data in and data out. 





miGRV=SLescCrACMCS 


Sea 5 ~~ 3 EI LTO ea EE, oe ee DB 





GAPP SYSTEX] JNJIPLEXIENTATION 


EAST -TO -YEST 
PROGAAMASLE YRAP-AARCUNO 


‘? CN GLCCAL OUTPUT 














3 
: =3 GAPP ARRAY GAPP 
= Z OF PROCESSOR SYSTEM 
: as. CONTROLLER 


) CORNER CONTROL 


oa CoN 
VIDEO { * CORNER TURN 
| VIDEO 
_ oN | os { LINE BUFFER > 
= OU 
DATA IN DATA OUT 


a HOST GPU BUS 





Rk 


TS QA\PP ARRAY AND BUFFER 


ee 
Rldi- Stet ; to easels 
\ wm Foe 
2 Se ee Ney a ee er New eA ede oe 
‘SECELESTSQNICS P| | Lewd ao Neat Ne P13 Lew d 
. Ne te ee ee el ee te 
Sos CCU lINS,: SCL oan v 








ARIE ddVD ay Unie HOHE TIE ftp Wunuuiutuu pue Gurueuesboud yo 
AW dus sOJ BUY Buyssadosd E/| JEUONIPPE 430 sape sy poyyaus sty} 






= EJEP MOPUIM 
payndwoa AjsnojAat Jayjpngq awesy 
suoiytyted sayyny awesy ynduy ae 
77 PAOPUIM ZL yndyno 0} Aewe wo} eyep yncut 
‘ mopuim 


RJep yndyno mopulM 
Peres Seetreneriee JIAO 


peddew 
Ae diy 


ddV9 





OQVHO1039 ‘SNITNTOS 1H04 


VUE ANOLE FO PMP DUAD racic 
re p 4 —_ 
IN e 


| ; is ia | 


"aluny Gur 


yUauaje 
burssaoo4d ddvd 

sjaxid d 
Sule judd | 
IVY dd 

ajBurs | 

O}UI a a eS ee 

paddew T 
afeuuigns | 


sug!}!}4ed 
mMapuIM Z| 





eM cyt payyo ay afiesn | pre pur SurwuwwesGosd payeoyjduios 
}jO Sapedy poyyaw sy) ‘buyddew ayy ayepowosse 0} suolesado yy1ys pue 
Ssasppe YIM salut} d sajnaaxa wesbosd y paddew si afew —qns ayy yeu) 

yuaUala buissaoo4d ayy Ut WIWH dd) Ul pauoys st abewigns e 30 yaxid Yyoeg 


Ae Jy 
4 ddv9 


abel 





MUCORT ? ff 


OGYYO109 ‘SNITI09 1403 
SOINOYLOAITAOYNOIN 


al] ey, 








WE) (2 


MICROELECTRONICS ae ae ae ee, 
FORT COLLINS, COLORADO G. ae fe 


‘}> 
AJ 
A) 
“hs 

~< 
C) 
O 
Ze 
O) 






FRAME D 


RAM OF 
RA} 








= PROCESSOR 
=z Li ARRAY 
Ze 


tput 


ew eae 
Line of zi Z ge *: Input line 
rame — of frame 0 


AU 


ANN 
2 


Nt 
‘y 
WAY 
th 


Ws 





A eer Ew LATCH PLANE 
y os X/: EW # TURD 
CM: =RAM(x) KZ BAe ee. ae R 
EW: #RAM(x) : 
2 LA 
> _— Li 
LEO 


M7e 


rr 


“UW 


a a 





INQ jelsas pue 
Ul jelsag *3}2YS—14) ‘D/| 





a .  lanesed — 66Z0HbL 5 
wean, | fallaaw499/30998185 yons Ja1s16ay 
peay Wis Jessantun 





SSOIppy 


‘HS yO y40d 


O/I 1216480 
0} yNdy| 


fe say Z2£99S¢ HON 


900000 





OdVHO109 ‘SNIT11090 1HO4 


SOINOHLOATIOUOIN 


ie aA VIL ONE Vif wo Ad fea DG 
MAR —-LVUOQD HMEXOT QROLPS, eal SI 4 








. “Aesse ayy Guryim so fuipeas yim 
ya LUNIA YOU St s4aysiBas yyIys ayy 0) Buryium pue UO’ Buipeay sabe yueapesig 


“MOPUIM AYY YO BUI, YIe9 JOY IYIYAY “£ 
“‘Ayjetquanbas Burssasppe Aq suaysiGas yyiys peay -z 


(-aaoge se atwwes :suoijon.aysuip) “Aesse ay} Woy aut) e peoy ‘| 
INILIAdDLNO 


( 4AIO=INVHY 
SIND= "WO 
WIVY=WID ‘suoljonaysut €) “mopuim ay) yO aut] Yyoea OY | WIdGIH EC 
“sauty ynduy Ajsnoiaasd jye syyiys Aesue ayy Aewue out yiys 2 
"EQEP MOPUIM 4O JUi] Ee YIM “Y's peoy *} 


ONILINdNI 


OOVYOT0D ‘SNITIO9 1H04 
SDINOYLIATSO“NOIN 


1 


AN UNIS 





oy 


} . | a “= ad be H whe | He. et bees cad | eons | dette Beaks beateacll ew \iteta:. 





jo.2yu07) ~ | 


yee 


Aesay 21995 HON 








OdVYOTOS ‘SNITIOD 1403 


SOINOYLOATAIOBNOIN 


(A GAS) VFA —-JOWIOD QLOGI? LUE cay a) i, 
| 





‘Ae LIE BY} FO 
Huimpeay ay) YIM JUasNIUOD SI JaJjnq U.Ny JauJOD yO BuIped] :abeyueapy 


“MOPUIM AY) YO aut} Yee 40} IW4d5y EC 
‘Ayyetquanbas Huyssauppe Aq siaysibas yylys peasy °2Z 
(‘aaoge se aves :suononsysulg) “Aese ayy wo aul e peo} ‘| 
, INILLAdLNO 
( ‘WO=4AfWy 
SIVO=WO 
WH INO ‘suonaonsysur Aesse ¢) “MOpuIM ayy JO aul] Yoed 10) IWIdAH "EC 
‘sautp yndusy Aysnoiaasd pye syyiys Aesue ajiym Aesse OVU! 2414S °Z 
"BJEP MOPUIM JO aul} e YYIM QIOGp pea] } 
DJNILLADNI 





OOVHO109 ‘SNITIOD 1804 
SOINOYLOATSOY DIN 


i ! eal 3 | ! | | 





INO jel4as pue 


pIUS Ul fLIdag “ayeys—i “O/I 
ay ——-— laleted — 66Z9HbL se 
peay =a lajro.nweg/wisaies | yons saysibary 
ssa. Ippy / \ INS fEssaalUp] 
ul NY 


"H’S 40 }.t0d 


000 : O/| f2}1e4ed 
0} O/I &}#0 
S. ul 
"hailhe A | Ite [ow sony 
M/Y 
jno 
}HNG “T. 
sitig 000000 
Aedity Z199Sb HON 
ud 


°000000 





- -paene Pare erm vende OG¥VY0109 ‘SNIT109 LHOJ 
4 py gee SDINOHLOATIZOHOIN 





CHIR LLULOD HOOT QROLOG Xl ) IN 





COR STE pes naeretl ay Wey cysp. pute. pUEPUNpPAayY 7Sa0e UPA 


" Aesse ayy Gurytum 20 buipeas yin 
yua4NSUOI you se s4aysibas yy1ys ayy 0) Gurytum pue cuosy Buipeay :sabeyueapesicy 


“MOPUIM ay} YO BUI] YOR JOY FWIdAH ‘bP 

‘Ayjeijuanbas Gurssauppe Aq suaysibias YWlYys peay *¢ 

"WIWH 9) WO4) aul) @ peo} ‘2 

‘BUNT B Ye aueH 1g Livy OUI Aetue yO SyUayUOD YyUS | 


INILLAd LAO 


aunty & ye aueyd iq Aesse OV! yyIYNs pue eLeP Livy Peay *p 
“MOPUIM al) 40 aul) Y9ea JO) I WIdSY ‘£ 

WIVH OU} 3pIYS “2 

‘e}ep MOpUIM jo aUI] e YIM “YS peor “| 

YT HLLAdNI 





OQVHOT09 'SNITIOS 1U04 
SOINOYLOATIONDIN 


Rano 





pores) 


peadtay wily 






Wt 


SUG 


APM ZLN0OSF HON 


900000 ra 





OOVHO109 ‘SNIT109 1HO4 


fou. - SOINOH1939130N9IN 


“Sqyuawuaja sossad0ud 2799Gp Ul Wy jo juNOWwe pa3}iu) 7] :abeuerpesig 


P8Z1/1)N pue pansasaid s) eyep yuepuNpas jeyuozZis0Yy pue Aewue ay) 40 


Buipeo} ayy yy JUassnoUOD Ss} 4a4jNq UN) Ja~409 $0 Guypeo] :abeyueapy 


"EJP MOPUIM ajeos AesH yO aut} NO yZIYS puUe WiVH peay ‘¢ 
‘(eut| sad suoijon4ysus €) auesd 31q a4jjua ue soy yeaday *z 
(‘aaoge se awes *SU0}}9NIYSUIE) “Ae we ayy WO) auj| #19 e peo? *} 
) ONILLAdLNO 
(CWO= Wid oe 
‘SWO=WO 

WVH="WO = 19 Uf suolj9n jsut €) “usn} —saus09 
Uy UONISOd POM Guyrsiys ayy Sauesd 119 uy Buiygyys Aq Aewe peo) ‘bp 
“MOPUIM a4} JO aul] YORI JOY 1WIdIH *f 
WV JUaWaja sossa90sd 0} peo q -z 
"CJEP MOPUIM 40 aut] e YIM Z/59Gp peor “4 
SNILLAGNI 


ODVHOT09 ‘SNITIOD 1u04 
SOINOY LOA TIONOIN 


EIPIN , 


? 


| 





LN 


Cie 





MICROELECTRONICS Highio Parallel Data I/O 


FORT COLLINS, COLORADO 































| | | 
apropos 
and cant dane 


“a 


NCR 45CG72 array 





Output ct 





Output (See Fig 4b) input (See Fig 4b) 





~ 


Pac r . 
be JI oar 3 Highly Parallel Data Separawean 


MICROELECTRONICS 
FORT COLLINS. COLORADO 





Sub—window |/Q 






Shift Registers 
independently 
controlled 


High Speed 
Data I/O 
>10 MHz 


Shift, Load and Select 


Nife| 
aS eee 
MICROELECTRONICS Line Set irame Baler 


FORT COLLINS. COLORADO 








— ee eee 





Video Input Output to GAPP System 


45CG72 array 
window mapping LSFB1 


LSFB2 


LSFB3 


> LSFBI 


etc 











~ &"- ~ 
*, iy » i~ . 
- Me wie + 
4 on ~e Be = 
f ay Pie ‘ 
“ 


MICROELECTRONICS 
FORT COLLINS. COLORADO 





; MEMORY EXPANSION (A) 
UT 
/\ 


> GAPP 7 

ie DEVICE si 3 : 
- Cos _ 
; i= 


A 


1 CYCLE DELAY IN EXECUTION PER PLANE 
12 CYCLES INTERLEAVED PER PLANE 


9 i 





= SL 





— mn be a 
: . rR? 
ce oe i ia. 


MICROELECTRONICS 
FORT COLLINS. COLORADO 





MEMORY EXPANSION (B) 


ama /\ == 

ae 

GAPP = 
pevice {7PM re | 


SN 


LA]! WW 


1 CYCLE DELAY IN EXECUTION PER PLANE 
12 CYCLES INTERLEAVED PER PLANE 





32 


GAPP System Controller I. 


Control 





Control/Address 


Sequencer Commands 


Queve Commands 


Loop! (Decr cntr!) Pause for video 
Loop2 (Deer cntr2) Halt 


BOM Vtilities 


GAPP Read Buffer to Array 
GAPP Write Array to Buffer 
Load Window Read Buffer 
Save Window Write Buffer 
Test 

Write (Array edge) 

Read (Array edge) 


R/W Buffer  — Continue 
GAPP Read Interrupt 
GAPP write Breakpoint 


Load entrl Breakpoint 
Load cntr2 Jurne 
Loop Window Jurnp on G.O. 
Load Window R/W Buffer 
Clear G.O. Register Shift G.O. 
Board Status Hest Commands 
Busy Load Window 
Halt Save Window 
Break Buffer to Array 


Array to Buffer 
Test 


Clear G.O. Register 
Read G.O. Register 


GAPP System Controller L. 


Data 





Host CPU Bus 


Controller Commands 


(In addition to the Sequencer Commands) 


Additional Queue Commands Additional ROM Utilities 


Cait Subr (none) 
Return 
Load AR Registers 
Load CT Registers 
ALU Control 


Additional Interface Contro! 


(none) 


_NOTLWYOSLNI JUVMLa0S /aavnaUYH ° 
AUVMLIOS e 

AUNLOALIHOUY FYWMLAOS 
SUVMGUVH =e 

AUNLITLIHOUW 3UVMGUWH e 


ONISSSAOO"d ‘IVNDIS e 


NOILINIASG WALSAS e 


MATAYSAO 





OdVHO109 ‘SNITIOS LHO4 
SOINOHLOAITIOUSIN 


aOINA 


oa Pia 


AW[L ‘Wad NI S3LYNIGYOOD ahaLno 
S19XId 4 OL S = TIvd 4O SNiavu e 
GNNOUOMOVA NOWIa NO TIva AVYD e 
NOILOSLSG AWIL ‘Ivau e 

GNOOSS Udd SEWVd OF e 


SWVHd 21S X ZIG e 


SINANIXNINOaY NWALSAS 





O0VYO109 ‘SNITIO9 1403 
SOINOHL93 13 0O4N0IN 


BIE 


‘IVUNLYN YALIIA aN0dS/1adOS e 
XGANOO 34 LSNW LOdUrdO e 

dOUWT OOL LOdUrdO Lograd ¢ 

TTWHWS OOL LOACEO lograu « 
aqauiIndad 1 OL 3SOID ALIOIYLNAIDOA 
aQauIndgsd YNOLNOD 490g GaSO1D e 


NOILOaLAG 39Gqa SAINI TIVG AWUYD e 


SLNANININOGY NOILOALaG Loardo 





O0VHO109 ‘SNITTOD 1004 
SOINOY LOS TS0¥4N0IN 


nt 


j { ie } i | } i F j f 


4 ' f ‘ 


NE a EE ea RR Re ee er Ne) 


SNOILONYLSNI ddv¥O €19 = TWLOL °© 
SNOILONULSNI ddWD BT = GNW ® 

SNOILONULSNI dd¥5D €2@ = GIOHSANKHL °¢ 
-SNOILONYLSNI dd¥9 952 = aTIONY e¢ 
SNOILONYLSNI dd¥5D O€ = AGALINOVN YOLOGA e 


SNOILONULSNI ddv¥9D BE 


A AGNLINOWH ¢ 


SNOILONULSNI ddV9 8E X JONLINOVW ° 


SNO]LONULSNI dd¥9D SOT 


A Ta8O0S @¢ 


SNOTLONULSNI dd¥5D SOT X 1g80S_ e 


ALVWILSH AGOO YALTId THE80S 





OGVHO109 ‘SNITIOD 1HO4 
SOINOY LOATAOVOIN 


oD 





Cement ea 


(SS X 91) HOIH SI SNOILONULSNI davD 088 « 
GauINOsSUY ANY SUBLTIA 3¥OdS OML e 

LHOIA HLONST Fd SWHY e@ 

KAW ANO HLONAT/SNOILONULSNI GS e 

SNIGWY HOWA LV LS3L GIOHSSYHL e 

HLONAT ANWS dO SANOdS TIV e 

S7aXId 8 - T WOUd SAIGWY YaLNO e 
NOTLOACSY AWOdS ENAIGWUD AATLWVDAN eo 


NOILOALAG AWOdS LNAIGWYD AAILISOd e 


4LVW1ILSA AGOD YaLTId 3NOdS 





OGV#O109 ‘SNITI09 1804 
SOINOY LOS TAOUOIN 


i 
EID. 
N) 


} ] : i ie ee I 


QVaHYSAO 3SNVD SLOogaaaa 390g 
SdIHO dd¥D GZ = Z2/SL00°X (21S X zTS) 
AWIL AWWYd JO 8S2°0 

SNOILONULSNI dd¥9D 00S = AAUaSaY 


SNOILONULSNI dd¥9 00S = LIAdino 


SNOT LONULSNI dd¥D 088 4x#0dS 


SNOILONUYLSNI dd¥D €19 


Nt 


14a80S 


WiVG JO SWWYd GHL ONIGIAIGEAS 





OdVHO10D ‘SNITIOD LHO4 
SJINOH LOA TAONOIN 





LNAWLOTIY ONISSS00Nd GNODASITIIN SZ e 


SHO01d OOT = OT X OT e 


ee weS QoS FS oe ie es 





STIS) dO AVY 09 X 09 e 
SHOOTE IIAID 8b X BH NI AWHYY daLS e 


(OO8T) LYOS = SII9D aO AWUNY BNVNOS e 


dWTYAAO IVLNOZIYOR/IVOILYGA GaNINdaY 





OdVYUOT09 ‘SNITIOS LHO4 
SOINOYLIATAZOUSIN 





(SMOU 8) HEd4dNG NOILVLNdWOD dWTYdaAo 


(SMOU 8) YdddNd NOILISINOOW dvTYydAO 
(SMOWU 09) YaddNd NOILYVLINdWOD ViVd 


(SMO¥U 09) Yddd4NG NOILISINOOW VIVO 


LNAWaAYINOGY YdddnNd AWWUS 








OOVHO109 ‘SNITIOD 104 
SOINOYHLOATAO“NOIW 


AWUYY dd¥D 09 X ZT e 


NOILWLNAWATdWI dIHD dd¥9 e 


LSOH YOJ ONILIVNHOd INdlno e 


YOSSHIOUd ddWVO YOd ONILLWWHOS WLYG e 


AYOWAN YIIANG NYNL YANYOD 


OG¥VHO109 ‘SNIT1I09 14U04 
SOINOULIITIONOIN 


™m 


| 





i 


SYaTIONULNOD WHLIYOOIY SIONIS G3SdSHDIH OML e 


YaTIONULNOD LSOH e 


AWuaW dd¥D 09 X 09 « 


AWHHY ddW¥SO NUNL YANHOD 09 X ZT e 


SUgdginad LAdNI MOY &€9 ONOd ONId e 


SLNAWAXINOTY AYVMGUYH ANI19SVa 





O0VHO109 ‘SNITIO9 1404 
SOINOYLOAITAOWOIN 





GNNOYUY dVWuM HLNOS-HLYON ON e 
GNNOUY dVuM LSAM-LSVa ON 6 


SANIT NOILYOINOWNQO VIA O/T ® 


(AVHYYY S X OT) SdIHO dd¥D 0S e 





OGVHO109 ‘SNITIOD LHO4 
SIINOYLIATIAOYOIN 








GNNOYUY dv¥uM HLNOS-HLYON ON e 

GNNOUY dVuM LSGM-LSV3 ON e 

O/1 YOSSID0Ud ddVD YOd SANIT LSAM-LSVa_ 
O/I YaddNG@ GNWYS YOd SANIT SNOILYOINNWNOD e 


(AVUYY S X Z) SdIHD dd¥D OT e 


A 


= 


4 


AWNUWY dd¥9D O/I NUNL ysanysoo 





O0VHO109 'SNITIOO L044 
SOINOYLOATIZOUSIN 





i NE RN EE RE TY 


NYNL YANYOO/YOSSADONd NIVW e AONTNOAS ANILNOUGAS AIONIS e 
GuOM 'IOULNOD LIA ¥Z eo SUALSIDAY WLYVG GNY SSauday e 
LSOH WOUd TIOULNOD GaqdS MOT e ; LSOH WOU 


SduoOM 
"TOULNOD 






I 4 

Ec a 

SLId "2 X x1 - L 
(SdIHD 9) ie 4 
AUOWAN WOUd | : 5 
GxadS HDIH 4 a 
lu u 





ZHA OT 


‘IOULNOO dd¥D OL 





OdGVYO109 ‘SNITIOD LHO4 
SIINOYLOATSZOUOIN 


AYN LOALIHOUW YUATIOULNOOD ddv5D D 





Ex Oh i) Ee et Es 


ie aie abel aay 





YALSIO“U DLVLSIUL 


| HALSIONUY FLVISIUL 


Ydddnd ANWUd ddv9 








(SdIHO HVE 9T) 
AUOWAN BX ze 


(SdIHO HVE 9T) : 
AUONGN BX WIZE 9 


OOVHO109 ‘SNITIOS 1804 
SOINOYHL93T3045SIW 


FIP 





woSsao0ud LSOH , 





AYONT NUAL 
UENYOD ddVD 
WOUd SLINSTU WITIOULNOD WITIONLNOD 
dd¥D OL wgaana 
urTiounog ved Ob 
UAL 
WANUOD OL 


YaTVICULNOOD WALSAS 





OdVY¥O109 ‘SNITIOO Lu04 
SOINOYLIATAOUNIN 


> 
ie S 
a ae - s . 


‘ * ; : , | 
I I | 





Ysddnd AATYC HOIH woay HI 


HOIAIG ddvd 


SUdATYd NO ONIGVOT Ivnda e 
ADOTOdOL - He 


SdIHO OL SAWIGG GWIL TWNO eo 





OOVHO109 ‘SNITIOS 1HO4 


SOINOYLDITSOUDIN 
SINIT WOOD % Ssauaav *1ONLNOD 





Guv¥Od NO DNITdNOOAG YAMOd 


dIHO AYGAdT LY ONIWNOOAG AATLIOWdWD e 


SSINVWId GNNOUD FGOWdNS TINS e 


SANVTd YAMOd SOWINNS TINA e 


ONTI1dNOOdd 3 SANVTd GNNOYD 


OOVYO109 ‘SNITIOD 1u04 
SOINOHID3ITIOUSIN 





) 
i 





NOILWINWIS 3GO0D Galiviga 

¥INIT 3009 ddvD 

dd¥9 NIVW YOd S¥9074 3009 NOWHOD 
dd0D NOILOaLSQ 

a3d0D 3N0dS e 

g3d0) 1g60S_ 


Ad00 NYNL-WAINYOO e 


NOILVINAWNATdWI SYYMLIOS 





OGVHO109 ‘SNIT1I09 1403 
SOINOYLIZTIO¥NSIN 


AIGVIIVAY Ya TaWassy 


LNALOISANS NOLLWIAWIS AWUYY TIVNS 


XINN YSONN O NI NaLLIM 


YaAMOL YON 
T1-dQd oad 
v-Od UON 
LX/Od WAI *dOd FIT IIVAY 


LNANdOWASG LNAIOIGda YOA AYVSSAOIN 


NOTLVIAWIS Longodd 





OOVYOT09 'SNITIOD 1404 
SOINOYULOFTAOUOINA 


EIEIO . 


| 


WVY - WO 

WO - WO 

WD .- Wvu 
MOU YUdd SNOILONULSNI dd¥D $2 


4000 LVdda4u 


dd¥2 NI df MOU LXAN OLINI Wivd LSVT AAOW 


ddV¥O NIVW 4JO MOU WOLLOG OLNI WIV0 LIIHS 


Su¥aLS1IOdd LSAM-LSVG OL ViVG YddISNVUL 


AWaaYY OLNI SGHOM JIG 8 - O9 LAIHS 


4d000 NHNL YANYOD 





OG¥VH01029 ‘SNITIOD 1HO4 
SIINOYULOAITAOYNOIN 





ATTVOILYVNOLNWY GSXINI'IT SAINGON 
4ATNGOW GIOHSAYHL 

J1INGOW AION 

SJINGOW AGALINOVN 

AINGOW A-1990S 


JINGOW X-TdOsS 


-gqoo ‘Igg0Ss 


OGVHO109 ‘SNITIOS 1LU04 


SOINOYLOATAOUOIN 





EIEN | 


} i j 
1 1 


‘arene 


NOILOSLAG YaAdOUd NO SIQIOSG 91901 

SWNS 9T "ITY GIOHSSYHL 

LSYId SLNAIGVYD SAILISOd 

A¥0dS HOWS YOd NOOTG JGOD 

dd¥O NI SWOS LNSIQWYD FAILVOUN LHDOIA NOs AYOWGW 


dd¥9 NI SWNS LNAIGWUD JAILISOd LHOIA YOdI AYNOWGW 


ADALWULS YALTId 3AWOdS 





OdVYO109 ‘SNITIOS 1804 
SOINOYLOATIOYVOIN 





ag09 SISWa HONOUHL sassva LHDIa 
SINZIGVYD HLOG Od 3GOD DISVa 
SdN0dS ‘II¥ YOd 3GOD OISVa « 
ADVYOLS AUVYOdNAL YOd B WWY ¢ 

£-0 WVU NI SLIG INAIGWHD » 

424-9 WA AZIIVILINI « 


Sb-¥I WH AZIIVILINI « 


SIdNWX4d UALIIA AWOdS aAdYoAd GP 





OOVHO109 'SNITIOD 1HO4 
SDINOYLIATAOWOIN 





€S<+-0S WVY OL 17 <—8T RVY FJONVHO 
S WY OL T WYY ADNVHD 


d¥0dS LNAIGVAS FATLVOAN 


O= TENE “ET 
AD=?DIWS=OZWWY °ZT 
O=:SN‘OCNWYH=2Ma “TT 

AD='D!WS="6T WVY "OT 
O=:SN{61WWH=!Md °6 
AD=2O'WS="BIWNVY °8 

BIWVY="SN °Z 

WS= SW °9 

O=?Of0=?SNf a= Ma °G 

SN= Ma °P 

N= SN °€ 

WS=*@ WWU “2 
0=20%QO=:MaSTINVH=2SN °T 


dAGOO dd¥5 OISsva 


JINGOW ANOdS AaANOAG SP 





OGVHO109 ‘SNITIO9 1404 
SOINOYLOATAIOVWOIN 


a DIN 


NP+GPCNE+OF WY OL 17¢8T WVH AONVHOD 
(P-N}) WWY OL T WYY GONVHO 
LNSAIGVYD AAILVOAN LZ -— & AWOdS e 


NP+GPNE+9OP WYN OL T¢«8T WY ADNVHO 
(b+N) WY OL T WYY SAONWHO 
ENSIGVYD AAILWVOAN € - 0 ANOdS e 


Nb+ZLItnb+eT WW OL Tc¢+8T WW ADNVHO 
N WWH OL T WWHY AONVHO 
LNAIGWHES AAILISOd N ax0dS~ ¢ 


0O=?D!Q0=:M9 OL S LINSNALVLS SDNVHD 
b LNAWSLVLS AAONSY 
9 GNY @ SaN0dS_ « 


€ LNAIWALYLS dAOWSY 
v GNY 0 Sax0dS_ e 


Sd¥0dS YaHLO YOd SAONVHO 





OGVHOT0) 'SNITIOS LHO4 
SOINOYLOATIAOYOIN 


EIEIO . 


! iz 3 | ; 


8-T 


NOT LOaLsd LOdreO SAITdWI S3A 
HLONAT YOd LN31OVUD AJAILISOd LSAL 
WVdYV LADNVL SALYNIWI'I9 SIA ANW 
HLONAT YOd ENAIGVYD AAILVOIN LSAL 


HLONGT YOd LNIIGVAS GAILISOd LSaL 


JISOT NOT LOALIG 





OOVHO109 'SNITIOD 1U04 
SOINOYLOATAOVOIN 





LSOH NI SALVNIGHOOD GANId3Y aAVS 
NWN'109 oanaa OL OD asn 

ddvVD NYNL YaNHOD 40 LNO M-3 JaI1KS 

MOY LOJLIG OL OD aSn 

ddVD NUNL YANYOD OL LNO S-N LAIHS 

LSOH NI SALYNIGYOOD AD018 09 X 09 AVS 
LOaLad OL LNdLno oD asn 


YaLSIOgd S-N OL SLIG NOILLOAILIG SAOW 


LNdLNO NOT LOALAG 


OGVYO109 'SNIT1109 1004 
SOINOW19D3 19 0UDIN 





Ne 
a 





. + 





WD = ?°(S)WYY Ma = :SN 
SWO = =WO a = °:*Md 
(S)WYWN = °WO. dON 
WD = : (Pp) WWE. dON 
SWO = :ND ad = °Mq 
(pV) WV = =WO dON 
WO = °(€) WY dON 
SWO = ?*WO ad = ?M4 
(C)Wve = =WO dON 
WD = 2(Z) WH dON 
SWO = ‘WO a = *Mg 
(ZJWWHY = *WD dON 
WO = : (1) WW dON 
SWD = =WO a = Ma 
(T)WWH = *WO dON 
WO = ?(O) NYE dON 
SWHD = :HD a = :Md 
(O)NVY = “NOD SN = *Ma 
ananO vot onIyAsu] anand uoy_onaqysu] 
Aeqazy 1z0ssal0ig Zajjng uy] 


ddqd00 NYNL waNnwdoo 





OOVHO109 ‘SNIT109 1404 
SOINOYULOATAOVUOIN 





~_ 


O= 2b Wve 

AD=2D WS=*CC WV 
2 WVU='SN ?M=iMg 
ve WVU? Mg 

AD=3D ¢WS= Ze Wid 
CZ WVU=2SN éMeioa 
€Z WVUs img 

ADs) tWS= 2 TE WW 
2 WYH=2SN $Me2Ma 
2 WVH# > Ma 

AQ=*D fWS250€ WV 
¥Z WYH=:SN !M=ima 
T2@ WVe2iMd 

AQ='D ¢WS=262 WY 
OZ WVU=:SN fM=imMg 
OZ WVU? M9 

AD=2D !wS=:e2 WU 
6T WYU=°SN fM=2mMa 
6T WVUs ima 

ADstD tWS=t4zZ WU 
BT WVU=:SN fMeitoMg 
OT WVHstag 

AD#'D twWS=t9Z Wd 
LT WVU=!SN éMsing 
LT WY¥8siag 

AD=2D *WS=iSZ WWE 
91 WYH='SN @¢m=img 





epo) 





O=:D fOT WWH=et Ma 
Y= 'bZ WVU 
ADd=2D !WS=ttZ2 wy 


GT WYU=ISN fgetma 


LOWVUstMa 

KD='D tWS=tz7Z WVU 
PI WVH=°SN fae:md 
9 WWHetng 

KD=2D tWSet1zZ WY 
CT WYU=:SN ¢as3Ma 
S WVU=?Ma 

KO=:D §wS=:0Z2 WVY 
ZT WWUsiSN $d=:mq 
y WVU=tMa 

KD=:D (WS=26T WY 
TI WWU=2SN (asimg 
€ WVY=:Ma 

KD=tD tWS=2hT WVY 
Ol KVH=!SN fa=itmg 
Z KWY= 2 Ma 

AD=3D {WS=taT WWE 
6 WVHSISN fdetmg 
T WVUs2Mg 

AD=39D (WS8t9T WY 
@ NVH=:SN fa=img 
O=:9 $0 WVu=:Ma 





2apod 


au09 UaLTId Tid0s 








OOVHOT0D ‘SNITIOD LHOd 
SOINOYLIATZONOIN 





Md=29 


SN=iMg 


MG=39 


SN=:Md 


Md=:9 


SN=2Md 


Md=:9 


SN*?Mga 


WS=Sh WWU 
0=!Ma:Q=:SN 
{WS=°bb WW 

N=°SN 

{PE WWH="SN 
S°SN 

bE WVU=°SN 
'WS=3¢€% Wy 
N=?SN 

{€€ WWU=:SN 
S=!SN 

E€ WVH=°SN 
‘WS=3Zb Wyu 
N=:SN 

$7€ WWH=:SN 
S=°SN 

ZE WVH=°SN 
SWS= Tp WW 
N=:SN 

$T€ WWH=!SN 
S=°SNn 

T€ WYH=:SN 
‘WS=°0b WWUu 
N=3SN 

{0€ WWu=:SN 





apo 


qd00 YaLid TWdos 





Md=:5 


Md=2D 
SN=3Mq 


Md=?9 


SNeiMgd 


M229 


SN=?Ma 


Md=t) 


SN='Mq 


S=°SN 

Of WYu=:SN 

‘WS=:6f WYU 
N=?2SN 

£6Z WWU=:SN 
S=:SN 

6Z WVY=:SN 

fwS='ge WW 
N=:SN 

{62 WWoe=:SN 
S=°SN 

@Z WVU=:SN 

fWS=i24e€ WW 
N=:SN 

£42 WWU=:SN 
S=!SN 

£2 WVH=:SN 

{WS=:9C WVU 
N=:SN 

$9Z WWYU=:SN 
S=°SN 

92 WVH="SN 

{WS=:Gt WW 
N=:SN 

{GZ WVWU=:SN 
0='D *S=°SN 
GZ WWY=:SN 





spo) 








OQVHO109 ‘SNITIOD 1004 
SOINOYULIATAOYNIIN 


a * 
ot 


D='€1T WVU 
‘WS=:7Z1T WV 
‘71 WWU=!Mg 
‘WS= Tl Wu 
{lI WVWU="Mg 
‘WS=:0O1 WVU 

O=:9 

{OT WWH=:SN‘/M=°Mg 
SN=!Md 

S=:SN 

G WVU=:SN 

WS=:ZT Wvd 

=3SN $71 WVH="Ma 
AD='0 §WS=21T WVU 
O=:SN {11 Wwu=!Ma 
AD=?3D !WS=:OT WVY 
0=:D 

£OT WVU="°SN !M=!Mqg 
SN=:Mg 

N=:SN 

€ WVu=:SN 

WS=:7T WY 

O=:SN $71 WVU=?Ma 
AQ='D !WS=21Tt WVU 
O=:SN ‘TI WVU=:Md 
AD=29 {WS=:0T HRY 
0=:9 

{OT WVYH=:SN ‘a=:Mg 
SN=!Mq 

<4 WWu=:SN 


7) 


=O HMO mH 
Owego 
(toe Ho Uf 
ow OD ve 
YZOZY 





apo 





i WWH=:SN 

WS=:Z1 WWY 

O=SN $ZT WWU='Ma 
Ad='3 {WS=211 WWY 
O=:SN £11 WWu=:Mqa 
AD=?D !WS="01 WVU 
0=:5 

SOT WYU="SN !a=:Mg 
SN=° Md 
N=:SN 

T WVYU=:SN 

=2Z1T WWU 

AD=!D !WS=:TT WVU 
O=°M3 £11 WWu=:SN 
AD=?9 !WS=:0T WVU 
0=°35 

701 WWU=!mMqa {S=:SN 
9 WWH=:SN 

WS=?TT WWU 

O=°Ma {11 WWH=:SN 
AD=!D {WS=20T WY 
0=*9 

{OT WWH=:mMa ‘!N=:SN 
Z WVU= "SN 

D=2*1TT WV 

KO='9 §WS=:01 WV 
Q=*0 

£OT WWH=":SN ‘!M=! Ma 
b WVU="M9 

WS=?OT WVU 

aim 

0=:9 

£0=°SN $0 WWU='Md 





2pop 


4009 YaLTI4A AWOdS 


BC 
ie 
92 
GZ 
bd 
EZ 


Nae? wma an 


=~ 





auTT 


OdVHO109 ‘SNITIOD 103 
SDINOULOATIOVOIN 


EE] ‘ N 
aii oN 
> 


; | 


$ 


Tce Ta a a 





! 





AYOWSAW HOYVAS WTIVAVd- 
AYOWSW Gassavady Vivd- 
- + AYOWSW AALLYIDOSSY- 
-G3TIV9 NALI0 JUV SSISOWIW FTIGVSSAUNGGYV LINILNOJe 


AYOWIW JHL 40 JSOduNd GNY NYISAG FHL OL 
SNIGUODDY SaluVA HILYW Y OL ISNOdS3Y S.AYOWAW JHLie 


TATIVUYd NI GAHIUVIS ANY SWALI VLYG GIUOLS Tive 


(.A3X. SHLISLNILNOD AYOWSW SHL 40 LUYd uO 
-SINILNOD AUOWAW JHL ONY GUOA HOUVAS Y N33AML39 
NI3A159 HOLYW Y AG GASSAD9DY JUV SWALI VLYG GsYvO1LSe 


EXMOWIW JTAIVSSTWAIV INIINOD 
V SI LVHM 





NOISIAID SOINOY LOI TIOUDIN 


EIEIO 


_ bo 


SUOLYIIGNI ASNOdS3u 


AVUYY ANOWAW 
9V1 4139 YAITIOULNOD AVUUY 


Yd31SiI93a ASV 


d31Si934 GNYUYdWOD 


SGUOA AYOWIN 


LSUIS- GNIS ‘AWIGNOdS3Y-LNNOD “WAIGNOdS3Y 3UVdWOD 
UIGNOdSIY3LIVA UIGNOdS3W GV3U “UIGNOdS3UL3S ‘S3IAILINIUde 


NOILVdsdO WY 





NOISIAIG SOINOH LOA TSOYNDIN 


EIEN 


+ eet pein ee: 


-A| 


ORGANIZATION OF DATA IN THE GAPP ARRAY 


eWORD SERIAL, BIT PARALLEL DATA liUST BE 
CONVERTED TO WORD PARALLEL, BIT SERIAL FORMAT 


eTAG BIT MAINTAINED BOTH IN A MEMORY LOCATION 
AND IN NS REGISTER | 


eRESPONDER SIGNAL CREATED FROM THE NS REGISTER 
sEDGES OF ARRAY ARE TREATED AS DUMMY CELLS 


eLAYOUT OF DATA IN MEMORY IN ONE PE 


RAM ADDRESS = 0---4------- 4+M ---- 125 126 127 
I-M BITS-| t 
CELL CELL+ TAG 
M-! 
(MSB) (LSB) 
lew 8 December 1984 


NOTATION OF ALGORi THMS 


efUNCTION NAMElare! _arg2} 
BRIEF DESCRIPTION OF THE FUNCTION 


DESCRIPTION OF THE ALGORITHM 


2 ; 
C LANGUAGE LIKE SYNTAX USED TO DESCRIBE 
CONTROL STRUCTURES THAT ARE EXECUTED BY THE 
CONTROL UNIT AND GAPP MNEMONICS THAT ARE 
PASSED TO THE ARRAY VIA THE CONTROL UNIT 


eIIME 


ASSUMES THAT THE ARRAY IS ALWAYS RUN WITH NO 
WAIT STATES ADDED BY THE CONTROL UNIT 


lew | 9 December 1984 


IMPLEMENTING THE COMPARE 


eEFFECTIVE BIT WIDE IMPLEMENTATION THAT CAN BE 
EXPANDED FOR MULTI BIT COMPARE 


eLOMPARE(addr, value), 

FOR EVERY RESPONDER, COMPARE SETS THE TAG BIT 
IF THE VALUE RESIDING AT THE RAM addr MATCHES 
THE va/ue ARGUMENT. addr iS A NUMBER BETWEEN 0 
AND 127. va/ve lS A BOOLEAN ARGUMENT. 


eMETHOD: 

LOAD THE NS REGISTER WITH THE VALUE STORED AT 
addr.LOAD THE EW REGISTER WITH va/ue. EXNOR THE 
EW AND NS REGISTERS AND PLACE THE RESULT IN THE 
NS REGISTER. AND THIS WITH THE TAG BIT IN RAM 
AND PLACE THE RESULT IN THE NS REGISTER AND IN 
THE TAG LOCATION OF RAM. THIS BECOMES THE NEW 
TAG. 


eALGORITHM 
/* LOAD THE NS AND EW REGISTERS °*/ 
IF( va/ue == 0) 

EW=0: NS=RAM( addr); C=1; 
ELSE{ 

C=1: 

EW=C; NW=RAM( acdr); 


} 

/*EXNOR INTO THE NS REG*/ 
NS=RAM(TEMP); RAM(TEMP) = SM; 
/* AND RESULT WITH TAG */ 
EW=RAM(TAG); C=0; 

C=CY; 

/*PLACE RESULTS IN RAM AND NS*/ 
RAM(TAG )=C; NS=C; 


lew 10 December 1984 


IMPLEMENTING THE EQUALITY SEARCH 


oLXACT MAIC 
SEARCH THE RESPONDERS OF THE ARRAY FOR AN 
EXACT MATCH TO THE MASKED COMPARAND REGISTER 


eMETHOD: 

USE THE COMPARE PRIMITIVE TO MATCH EACH BIT OF 
THE MASKED COMPARAND REGISTER WITH THE WORDS 
STORED IN THE ARRAY 


eALGORITHM; 
/* LOOP FOR EVERY BIT IN THE WORD */ 
for Gi=O; i<cm; it+){ 
if(mask(i) == 1) then { | 
COMPARE(cell+i, Comparand(i)); 
| 
} 


el IME: 

M * 5.5 CYCLES, WHERE M IS THE NUMBER OF BITS AND 
THE COMPARISONS ARE EQUALLY DISTRIBUTED 
BETWEEN O AND 1. 


lew 1 December 1984 


pa 


Vs 


WRITING INTO THE ARRAY 


elT IS POSSIBLE TO LOAD THE ENTIRE ARRAY VIA THE 
CM BUS BUT THIS IS NOT VERY EFFICIENT WHEN ONLY 
ONE OR A FEW CELLS NEED TO BE WRITTEN IO. 


oWRiiTkladdr, valuel 
WRITE THE BOOLEAN yva/ue INTO THE addr IN THE RAM 
OF THE RESPONDING ELEMENTS. 


eMETHOD: 

IF THE TAG IS SET THEN PLACE va/ue IN LOCATION 
addr. IF THE TAG IS NOT SET THEN PLACE THE 
CURRENT CONTENTS OF addr IN LOCATION adar. 


RESTORE THE TAG AT THE END OF THE ALGORITHM TO 
ENSURE THAT MULTIPLE INVOCATIONS WORK 
PROPERLY. | 


C4 
~ 
» 


lew 12 December 1984 


WRITING INTO THE ARRAY(CONTINUED) 


eALGORITHM: 
/*Load contents of addr into ew */ 
EW = RAM( addr); C=0; 


/*Produce logical AND of TAG , assumed to be in NS, */ 
/* with the contents of addr */ 
C =BYW; 

RAM(TEMP )=C; C=1; 


_ f*Load ee EW, tag assumed to be in NS */ 
if( va/ue == 0) 
EW= 0: C=0; 
elise 
EW=C; C=0; 


/* Logically AND va/ueand tag */ 
C=CY: 


/*load intermediate values in anticipation of OR*/ 
NS=C; EW=RAM(TEMP); C=1: 


/*perform OR and restore tag*/ 


C-CY; NS=RAM(TAG): 
RAM( addr)=C: 


lew 13 December 1984 


= READING FROM THE ARRAY 


eSHIFT OUT THE ENTIRE ARRAY VIA THE CM BUS 
-EFFICIENT ONLY IF A LARGE PORTION OF THE 
ARRAY IS OF INTEREST 


eUSE THE COMPARE FUNCTION 
-THE COMPAREladdr,_ 2) FUNCTION PLACES THE 
DATA AT addr ON THE RESPONDER SIGNAL WHERE 
IT CAN BE SHIFTED INTO THE aaa an 
REGISTER 


-THE COMPARE(addr._1/ FUNCTION PLACES 
INVERTED DATA ON THE RESPONDER SIGNAL 


7 -THIS IS EFFECTIVE WHEN A COMPAPEIS 
: REQUIRED IN ADDITION TO THE READ 


bh 
lew 14 December 1984 


READING FROM THE ARRAY(CONTINUED) 


ekEADaddrs 

PLACE THE DATA AT addr OF RESPONDING ELEMENTS 
IN THE NS REGISTER SO THAT IT PROPAGATES TO THE 
RESPONDER OUTPUT AND CAN BE SHIFTED INTO THE 
COMPARAND REGISTER. 


eMETHOD 

LOGICALLY AND THE RAM ADDRESS ‘TAG' WITH THE 
DATA AT addr AND PLACE THE RESULTS IN THE NS 
REGISTER. BY USING THE TAG STORED IN RAM, 
REPETITIVE CALLS TO THIS FUNCTION WILL WORK 
PROPERLY BUT THE TAG IN THE NS REGISTER IS 
GARBAGED. 


eALGORITHM: 
/*Load ns with the TAG*/. 
NS=RAM(TAG); 


/*Load ew with data */ 
EW=RAM( addr); C=0; 


/*AND tag and data °/ 
C=CY; 


/*place results in ns */ 
NS=C; 


eT IME 
4 cycles 


lew 15 December 1984 


fry 


ADDRESSING SPECIFIC ELEMENTS 
eLOAD EACH PE WITH A UNIQUE ADDRESS THAT WILL 
BE STORED IN RAM 


ePERFORM AN EYACT_MATCH SEARCH FOR THE 
ADDRESS TO SELECT A SINGLE ELEMENT 


eREADTHE CONTENTS OF THE RESPONDING ELEMENT 
eFOR A 516 X 516 ARRAY OF ELEMENTS: 
~EXACT_MATCH --> 114 CYCLES 
(BASED ON 266,256 ELEMENTS REQUIRING 19 
BITS OF ADDRESS) 
-READ --> 32 CYCLES 


-ASSUMING A 10 MHZ CLOCK THE ENTIRE 
OPERATION TAKES 14.6 piSecs 


lew 16 December 1984 


WHAT IS AN ASSOCIATIVE PROCESSOR? 


eINCLUDES ALL OF THE CAPABILITIES OF AN 
ASSOCIATIVE MEMORY 


eCAPABLE OF PERFORMING LOGICAL OR ARITHMETIC 
OPERATIONS ON ALL DATA WORDS OF THE MEMORY IN 
PARALLEL 


eASSOCIATIVE PROCESSORS ARE INHERENTLY SINGLE 
INSTRUCTION MULTIPLE DATA (SIMD) MACHINES 


eGENERALLY, SEARCHES ARE PERFORMED TO | 
INDENTIFY DATA ITEMS OF INTEREST (USING AM 
FEATURES) AND THEN THESE ITEMS ARE OPERATED ON 
USING THE AP FEATURES 


lew 4 December 1984 


CONCLUSION 
eTHE GAPP MAY BE USED IN ASSOCIATIVE MEMORY 
DESIGNS BY: 
-EMULATING BIT PARALLEL OPERATION 


~PAIRING IT WITH THE APPROPRIATE CONTROL 
STRUCTURE 


-USING THE GLOBAL OUTPUT AS AN OUTPUT PORT 


eTHE GAPP MAY BE USED IN ASSOCIATIVE PROCESSOR 
DESIGNS BY: 


-UTILIZING THE SIMD NATURE OF THE DEVICE 


-UTILIZING THE POWERFUL INSTRUCTION SET 


be 
lew 16 December 1984 


GNOJ3S Y3d SNOTAINULSNI 
(000°000’000 001) 


SdIW 000°00L 
(ZLS*¥2LS) S3IIAIQ ddv¥9 OOZES 


SdIW 006 (89*89) SAIIAIG dd¥9 2¢ 
SdIW 82 C2L*9) AJIAIG ddv9 1 
SdIW OS 9F 4 SW3LSAS WOIIdAL 

7 3 
SNOTLVII Idd¥ 


INISS3IIOUd JOVWI 


SNOSTUVdWO)D 
JINVWUOINSd ddv¥9 


SdOTIW O29 
SdOTdW 9°2 


SdOTIW 2°0 


SdO1SW OOF 93 OOL 


SdOT4IW O2 97 OL 


S3DIA30 dd¥9 OOZE 
S2IIA3G dd¥9 2¢ 
JIIAIN ddvO 4 


S(t2 Y3BAd 
dli-X AVUD 
SIWVUINIVN WO 


02089 “98208 


ZE/MIN “XVA 


THAW “Td “VO Ita 2¢ = WNOTiv¥sdo 1) 


IWV MANET VW 
ANTOd ONTAVO1s LIa 2¢< 








SWHLIHOS WW QA9DNVAOV 
AGOWAW JIEVSSAYOOV INJENOS 


SWHIINODT aIsva 


-SVIdOL 





OGVH0109 ‘SNITIOO 104 
SOINOYLOATIOWOIN 


a) INE 


(SSAILINIYd) 
SNHLIVOD TV OISVE 





OGV¥OT0D “SNITI09 1HO4 
SOINOYLIATAOYNDIN 


CIP) 


°0-=:9 af [a 


*T=:9 ((Q<c1ea) 3 T )gt 


} 
(T84°qQ)NIOD 


"A9YSTHOI DQ 3YQ OYUT FA anyeBa ey Jo g IQ YYU BYY SBynd UOoOTJOUNY sty] 


LNANI ‘'IVGO1D 





OdGVHO109 ‘SNITIOD LuO4 
SOINOYULOATAOYNOIN 





rab CT UC AWYE 


AD 32 
. N 
"O° 'DO Det CT UC AJNVE 
to MA & SN FO PENS oy Sto Ag e/ >. PCX INV SSN AQU') 


(i:f fy ie ae | ob) aay 
COX WW SS 
02°39 (ha) WVU: MM 

} 

(4'ASX)MVU 


OS yb ge Aequtey U tay, paom op), st uo pue fssouppe Ju1,4eys pasos yndyno oy] st 


. 


"SUL ppe RULZIOS prem yudur syy aq KX pues Pelz ayy yo SSOAPPS WVH FU) 9G gqoye'l 


"aN TRA Yysew 2 se UL paproey, 40 vOTAR[NOLBS Vv go ypnsaa vig 
Ma yiq SETS ayy, “torjyesadas pue Suyuotytpuos eyep a0oy [ngasn st ucTyOUNZ sty y, 


(WOM GNV LIL 





OOVEO109 ‘SNITIOD 1404 
SOINOHLIATIONOIN 


2, 


prereset 


‘y st 
Woy 


phaed png 


/* 


{/% ASONH HET 


eS ae (F 
ed | 
AN PLA NEW Fl 
beqagy ute | 


OM pat 


shes py ergy, 
v9yUq ayepdn 
Niue ff 
Autyyou og 
‘SN  } JAW 


/* Urs ats anv 


/t “© Pour 


*t WS 


wha ] MUpA 


Mier .d J 


EQ Uh) 


0] QUSw Mos4y dood 


OALPREPUUY 


+/ 


C(Sytq ju 


0 °SN 
“dou 


Qe. 


“NS CAVE OCd 
SN- MH eS ge 
(yt ryot) st 
J. °SN 
AQ OY 
‘WYN - SSN dl 
} 


“CY tgent tp-ulry ay 


a (ee 


AOGuny ) 


oat | rare a Me 0 | 
O-°MY O° ISN 
("Y)XVWD 


azIs p4som 


ssouppe duypg 


u 
ssa.1ppe p2eM WVHA = au 
4d 


St sazypsistaa 


*SUGTJBIDTT 
MA PUL 


}tq-a{diyzpnw sy 
"ANTEA WUNWE Xe 
u#O[J 8 S]yHS pue Aestze ayy ul 


ayy 


any_TeaA wNeLxXeMm ayy Spry 


spproy 


Yoavo Ul 


eu 


BO TJ JPUSIIND ayy PLOY 0) 
HRY Ad 


yigq 


VOTPIUN] STU 


LOALAG WAWIXVW 


‘1VUo'rD 


O0VHO109 ‘SNITIOD 1404 
SOINOYLO3 TS0OH0IN 








a do ppnsar ¢/ ‘2 WWII tia 
pH Vip Po soy dup yo e/ Ah) WVU ESN 0:4 
it MH. () 40 €) ¥/ M@s'9 WVH='Md 07° SN O:k 
re AN tl Vo3¥ di */ . [Oo tO: WV O-d 
Jt . Vo youd ¢/ PAs 3D WVU OMA tiv 
i ¥ AQ Wova od #7 "O° WWE CAG il 

i] 

{#4#86 (Us T tg - ©) aug 


{Ca )WVH= °SN 


bh) XU 


(ho ‘Oo ‘wu fy 


st 


ssaippe #e lg d 
J[Nsay ayy, Jo Ssvaappoe Burpsuyy — % 


puessde q so ssasppe Burzierys = W 
puesade yo yo ssaappe BaP - ¥ 
ssguppe afffanoys Asesadmay = £ 4's 


“$E Ea AY Slog pub sszoqund omy, ayy ase A pun X otoum 


(ai 9 V) 370 (4 8 HW) = 9 


:SU passaudxe og uvy Ssttpy, YSE[dttacsse oF MYPFAoRE OY 
‘dnypy Be Aq sazaqunu omy, Jo auoO PIaATAS OF ST WoT youNJ pajrsap A, yuanbouy y 


HOX Ad J J‘) LNW 





OOVHO109 'SNIT1IOD 1404 
SOINOY LOATAONOIWN 


2, 


Lndino 8610 Va 


INdNI O00 WVa 








OOVHO109 ‘SNITIOD 104 
SOINOULOITSONOIN 


EIRD | 


-O=: (LD) WVU 

/* ((Z 9 AL) BX) SI AD ¥€/ ‘AD= 50 
/* (2°39 AL) St MG &/ S0=:5N MO=:9 (O)WVU="'MH 
*H='ME MA 3 SN 

‘a= M4 


‘O='9) (O)NVA= i Ma 
} 
()HOLVWdWL 


“7sUq 

24} 07 sa0eTd Z ad FYR UT enzTBaA ayy st Z pus 4seg ayy 

OF AOQYsToOU SzelLpaMMT yy UT aN[eA ayy SE 4 ‘dd 947 78 

SNTeA ayy ST X atayuM 47 3 R_ BY X JO UOT eIad0 [BITZ0] 
94} w4ofuad pue sanrea Zutyozewe sv syndut [eqo{s asq :poyyep 


wlOla. JO SP0UBIANDDO TI eB Bd0q :a,dmexg 


HOLVN LOVXA —- ONIHOLVW ALVIUWAL 





OQVYUO109 *SNIT109 1404 
SOINOYLOATAOYOIN 





S10qgystsu YIM 


Sioqgystau yzIA 
3ut-40 


3ut-GNV 





bb 


asewl pyeurst49 





SUuOTIIAIIP g SUOTIIOITP 8 
NOTSOUF NOLLVI11¢ 


NOISOdA GNV NOLLVTIG 40 NOILVELSATII 





OdVHO10D 'SNIT1109 1404 
SOINOHLDA TAONOIA 


EID 


& 


| ! ae i 





MICROELECTRONICS 
FORT COLLINS, COLORADO 





DILATION 


Silation is the expansion of an image in the algorithmically determine! 
direction:s). bs 


Method: Extend single bit plane image by shifting the bit and *or~-ing’ 
it with its neighbor. 


Example 1: Dilate image in (1,0) direction (north). 


NS: =RAM(0) C:21; /* Load NS with input image x/ 
NS:=S EW:=NS; /* Shift image north one pixel */ 
C:=CY; /* *OR’ image with shifted image *- 
RAM(1)}:=C; /* Store results in RAM x*/ 


Example 2: Dilate image in (+-1,0) (0,+~1) directions {‘north,south,east, 
and west), where the pixel neighborhvod is defined by: 


D 
C AB 

E 
NS: =RAM(0) C:=l; /* Load (A) into NS */’ 
NS:=S EW: =RAM(0); /* Shift (E) into NS and ‘{‘A) into EW x / 
C:=CY NS: =RAM(0O): /* C=(A)+(E}, Load (A) into NS *- 
EW:=C C:s1 NS:&=N; /* EW=(A}+(E), Shift (D) into NS */ 
Ci2cy: /* C=(A)+(D)+(E) x/ 


NS:=C EW:=RAM(0O) C:=1; /* NS=(A)+(D)+(E), Load (A} into NS ¥*/ 


EW: =E; /* Shift (B) into NS *- 

C:=]CY: (* C=(Ai+(B)+(D)+(E) x, 

NS:=C C:=l1 EW: =RAM(0O); /* NS=(Aj+(B)+(D!+(E}, Load {A\ into EW 
C:=CyY: 7X C=/A} 4B 4°, Te EY. «-: , 
RAM‘ 13:=C: ** Store result in RAM * 


AYOWANW J18VSSaYadaV LNALNOO 


OGVHO109 ‘SNITIOD 1H04 
SOINOHLIOATIOVUOIN 


a} S|N 





-_— oo ow 





WHAT ISA 
CONTENT ADDRESSABLE MEMORY? 


STORED DATA ITEMS ARE ACCESSED BY A MATCH BETWEEN 
A SEARCH WORD (THE "KEY') AND THE SPECIFIED PORTION 
OF THE CELL MEMORY CONTENTS. THE REMAINING MEMORY 
CONTENTS OF THE CELL ARE USED AS THE DATA. 


ALL STORED DATA ITEMS ARE SEARCHED IN PARALLEL. 


THE MEMORY'S RESPONSE TO A MATCH VARIES ACCORDING 
TO THE DESIGN AND PURPOSE OF THE MEMORY. 


CONTENT ADDRESSABLE MEMORIES ARE OFTEN CALLED— 
—ASSOCIATIVE MEMORY 
—DATA ADDRESSED MEMORY 
—PARALLEL SEARCH MEMORY 


‘i 


CAM OPERATION 


PRIMITIVES; SET RESPONDER, READ RESPONDER, WRITE RESPONDER 
COMPARE RESPONDER, COUNT RESPONDER, FINO FIRST RESPONDER 


COMPARAIND REGISTER | 


: f \ 
4 MASK REGISTER 
LY 


| 
ARRAY CONTROLLER | 





=~ = 


ee 





I 

|. 

| { we FATA TAG 
' 

i} 

| 


MEMORY ARRAY 


iad. 


i 


a 

O” ots 

>-4 
Beit pal lh 
“pil 
t ae 
i imbead * 
ie 





ORGANIZATION OF DATA IN THE GAP? 
ARRAY 


WORD SERIAL, BIT PARALLEL DATA MUST BE CONVERTED TQ 
WORD PARALLEL, BIT SERIAL FORMAT. 








TAG BIT IS MAINTAINED IN BOTH A MEMORY LOCATION ANT IN 
THE NS REGISTER. 


RESPONDER SIGNAL !S CREATED BY THE NS REGISTER USING THE 
GLOBAL OUTPUT SIGNAL. 


MEMORY ALLOCATION WITHIN EACH GAPP PE: 


Pa ae a eee ee 126 127 


RAM ADDRESS = 0...... Kix ce 
{-K BITS-{-" BITS-} 
| KEY | DATA | TAG 
(HSB) (LSB) 


65 


NOTATION OF ALGORITHMS 


efUNCTION NAME(are!, argZ/, 
BRIEF DESCRIPTION OF THE FUNCTION 


eMETHOD: 
DESCRIPTION OF THE ALGORITHM 


3 
C LANGUAGE LIKE SYNTAX USED TO DESCRIBE 
CONTROL STRUCTURES THAT ARE EXECUTED BY THE 
CONTROL UNIT AND GAPP MNEMONICS THAT ARE 
PASSED TO THE ARRAY VIA THE CONTROL UNIT 


eT IME 
ASSUMES THAT THE ARRAY IS ALWAYS RUN WITH NO 
WAIT STATES ADDED BY THE CONTROL UNIT 


IMPLEMENTING THE COMPARE 


eEFFECTIVE BIT WIDE IMPLEMENTATION THAT CAN BE 
EXPANDED FOR MULTI BIT COMPARE 


slLOMPARL (addr, value): 

FOR EVERY RESPONDER, COMPARE SETS THE TAG BIT 
IF THE VALUE RESIDING AT THE RAM addr MATCHES 
THE va/ue ARGUMENT. addr IS A NUMBER BETWEEN 0 
AND 127. va/ve IS A BOOLEAN ARGUMENT. 





eMETHOD: 

LOAD THE NS REGISTER WITH THE VALUE STORED AT 
addr. LOAD THE EW REGISTER WITH va/ue EXNOR THE 
EW AND NS REGISTERS AND PLACE THE RESULT IN THE 
NS REGISTER. AND THIS WITH THE TAG BIT IN RAM 
AND PLACE THE RESULT IN THE NS REGISTER AND IN 
THE TAG LOCATION OF RAM. THIS BECOMES THE NEW 
TAG. , 


eALGORITHM 
/* LOAD THE NS AND EW REGISTERS */ 
IF( va/ue == 0) 

EW=0: NS=RAM( addr); C=1; 
ELSE{ 

C=1; 

EW=-C: NW=RAM( sacar): 


} 

/*EXNOR INTO THE NS REG*/ 
NS=RAM(TEMP); RAM(TEMP) = SM; 
/* AND RESULT WITH TAG *7 
EW=RAM(TAG); C=0: 

C=CY; 

/*PLACE RESULTS IN RAM AND NS*/ 
RAM(TAG )=C; NS=C: 


66 


IMPLEMENTING THE EQUALITY SEARCH 


oLKACT MALE 
SEARCH THE RESPONDERS OF THE ARRAY FOR AN 
EXACT MATCH TO THE MASKED COMPARAND REGISTER 


eMETHOD: 

USE THE COMPARE PRIMITIVE TO MATCH EACH BIT OF 
THE MASKED COMPARAND REGISTER WITH THE WORDS 
STORED IN THE ARRAY 


eALGORITHM: 
/* LOOP FOR EVERY BIT IN THE WORD */ 
for (i=0; i<cm; i++ ){ | 
if(mask(i) == 1) then { 
COMPARE(celiti, Comparand(i)); 
} 
} 


eTIME: , 

M * 5.5 CYCLES, WHERE M IS THE NUMBER OF BITS AND 
THE COMPARISONS ARE EQUALLY DISTRIBUTED 
BETWEEN 0 AND |. 


WRITING INTO THE ARRAY 


elT IS POSSIBLE TO LOAD THE ENTIRE ARRAY VIA THE 
CM BUS BUT THIS IS NOT VERY EFFICIENT WHEN ONLY 
ONE OR A FEW CELLS NEED TO BE WRITTEN TO. 


e¥RTEaddr, value) 
WRITE THE BOOLEAN va/ue INTO THE addr IN THE RAM 


OF THE RESPONDING ELEMENTS. 


eMETHOD: 

IF THE TAG IS SET THEN PLACE va/ue IN LOCATION 
addr. IF THE TAG IS NOT SET THEN PLACE THE 
CURRENT CONTENTS OF addr IN LOCATION addr. 


RESTORE THE TAG AT THE END OF THE ALGORITHM TO 
ENSURE THAT MULTIPLE INVOCATIONS WORK 
PROPERLY. 


6 


WRITING INTO THE ARRAY(CONTINUED) 


eALGORITHM: 
/*Load contents of addr into ew */ 
EW = RAM( addr); C=0; 


/*Produce logical AND of TAG , assumed to be in NS, */ 
/* with the contents of addr */ 
C=BY; 

RAM(TEMP)=C; C=1; 


/*Load va/ue into EW, tag assumed to be in NS */ 
if( vasue == 0) 

EW=0; C=0; 
else 

EW=C; C=0; 


/* Logically AND va/ueand tag */ 


CC; 


/*load intermediate values in anticipation of OR*/ 


NS=C; EW=RAM(TEMP ); C=1; 


/*perform OR and restore tag*/ 
C=CY; NS=RAM(TAG); 
RAM( addr)=C; 


READING FROM THE ARRAY 


eSHIFT OUT THE ENTIRE ARRAY VIA THE CM BUS 
~EFFICIENT ONLY IF A LARGE PORTION OF THE 
ARRAY IS OF INTEREST 


eUSE THE COMPARE FUNCTION 
-THE COMPARE(addr.@/ FUNCTION PLACES THE 
DATA AT addr ON THE RESPONDER SIGNAL WHERE 
IT CAN BE SHIFTED INTO THE COMPARA 
REGISTER 


-THE COMPAR E(addr,_{/ FUNCTION PLACES 
INVERTED DATA ON THE RESPONDER SIGNAL 


-THIS IS EFFECTIVE WHEN A COMPARE IS 
REQUIRED IN ADDITION TO THE READ 


65 


READING FROM THE ARRAY(CONTINUED) 


efEAD(adar/, 

PLACE THE DATA AT ader OF RESPONDING ELEMENTS 
IN THE NS REGISTER SO THAT IT PROPAGATES TO THE 
RESPONDER OUTPUT AND CAN BE SHIFTED INTO THE 
COMPARAND REGISTER. 


eMETHOD 

LOGICALLY AND THE RAM ADDRESS ‘TAG* WITH THE 
DATA AT addr AND PLACE THE RESULTS IN THE NS 
REGISTER. BY USING THE TAG STORED IN RAM, 
REPETITIVE CALLS TO THIS FUNCTION WILL WORK 
PROPERLY BUT THE TAG IN THE NS REGISTER IS 
GARBAGED. 


eALGORITHM: 
/*Load ns with the TAG*/. 
NS=RAM(TAG): 


/*Load ew with data */ 
EW=RAM( addr); C=0; 


/*AND tag and data °/ 
C=CY; 


/*place results in ns °/ 
NS=C; 


eT IME 
4 cycles 


ADDRESSING SPECIFIC ELEMENTS 
eLOAD EACH PE WITH A UNIQUE ADDRESS THAT WILL 
BE STORED IN RAM 


ePERFORM AN EXACT_MA TCH SEARCH FOR THE 
ADDRESS TO SELECT A SINGLE ELEMENT 


e READTHE CONTENTS OF THE RESPONDING ELEMENT 
eFOR A 516 X 516 ARRAY OF ELEMENTS: 
~EXACT_MATCH --> 114 CYCLES 
(BASED ON 266,256 ELEMENTS REQUIRING 19 
BITS OF ADDRESS) 
-READ --> 32 CYCLES 


-ASSUMING A 10 MHZ CLOCK THE ENTIRE 
OPERATION TAKES 14.6 Secs 


WHAT IS AN ASSOCIATIVE PROCESSOR? 


eINCLUDES ALL OF THE CAPABILITIES OF AN 
ASSOCIATIVE MEMORY 


eCAPABLE OF PERFORMING LOGICAL OR ARITHMETIC 
OPERATIONS ON ALL DATA WORDS OF THE MEMORY IN 
PARALLEL 


eASSOCIATIVE PROCESSORS ARE INHERENTLY SINGLE 
INSTRUCTION MULTIPLE DATA (SIMD) MACHINES | 


eGENERALLY, SEARCHES ARE PERFORMED TO 
INDENTIFY DATA ITEMS OF INTEREST (USING AM 
FEATURES) AND THEN THESE ITEMS ARE Crees ON 
USING THE AP FEATURES 


CONCLUSION 


eTHE GAPP MAY BE USED IN ASSOCIATIVE MEMORY 
DESIGNS BY: 


-EMULATING BIT PARALLEL OPERATION 


-PAIRING IT WITH THE APPROPRIATE CONTROL 
STRUCTURE 


-USING THE GLOBAL OUTPUT AS AN OUTPUT PORT 


eTHE GAPP MAY BE USED IN ASSOCIATIVE PROCESSOR 
DESIGNS BY: 


-UTILIZING THE SIMD NATURE OF THE DEVICE 


-UTILIZING THE POWERFUL INSTRUCTION SET 


SWNHLIVOD TV GADNVAGV 





OdVHO109 ‘SNITTIOD LHO4 
SOINOYULIATAOUOIN 


EIR . 





“ES 
=< ae ares Es = 
7 AL aa an 4 
. ite ioe > be 
: N 
» 
SKELETON: ZATION 
MANY PATTERN RECOGNITION APPLICATIONS FeOlLise Tae A SS mes: 
= ~~ re ea vm Fe a —_ pe, _— —_ a ted aad —_ — = — -— et - 
BE TRANSTORMED INTO 17S SKELETON, CONSISTING IF ONS =92 27-7 
t ~ = | wo ~~ = — -— A! Nf = 5 —' on Feat ag -_ — Eee ma ‘ = <= 
WED AETALMING Tite SRAPE ADO TLPOCLOC Sa. OF Teel So SNS ee as 
-— = eR om == ' eens sagt ° « = = so 8, oat Ln me 
ree ischirsmt IOfa iS (CO tmUces AWAY TSS tact Ao erie 3 
rm Nm NR iy pms i en Lr oa) ae, nent tas s o_o Seon s* * ~ Te aeomet Stone ie a 
Sa ee it re MAY BE AC TO aa a ca bd Ra oS -~ Nat. — = = 


§ 
OWING SEQUENCE OF ee 
CLEAN PATTERN 
FIND CONNECTING POINTS 
CLEAN PATTERN 
FIND END POINTS 


THIN PATTERN 


TheiS METHOD iS DESCRIGED IN SETAIL IN A SAFER WRITTEN By EST 
T 


' en “SKE! STN, iZA TION: OF MSY it A". mee AA . ee. “am ¢ 


. tae! ee — —_ tm om - oh + oe’ cet ap ‘aut om ot 
re oN aS SSeS AT ONet ear SARA RO SES ea: 
wet! 44a Pars ehEL OO =, Pee yf jf TERN nf AM wis fia c. f ea Sas emg lh OE ee ee LO 


Gat AT eae “a 1972. 


Hl 





a CONNECTING 


POINT 





Pai ~~. 


Pr. “ENO FONT 


POINT DEFINITION FOR SKELETON 





OPERATIONS 


CLEAN PATTERN: THIS GRERATION ELNUMATES STRAY SDT Arie 
Reet CAUSE Farce TONNESTIONS O8 Tey Ss ei TSE SAT TES eT 


een Fa -- sos me a Oe - SA we i 
‘an’ 0 7 - =! “, ‘ + mem NS 
ews tet oe foo Vata ne ent ei mot 2 A 4 wd ve ts : 


FIND CONNECTING POINTS: THIS CPERATICON IDENTIFIES “DINNEOT a 
SO8iTS" i AN IMAGE AND FLAGS TREM SO TREY Se Sir ceotco Awa” 


SUSING THE THINNING PROCESS. 


FIND END POINTS: T!S GPERATION IDENTIFIES "END FLINTS" i. THE 
SAT TESN 
THIN: TH!$ OPERATION ESODES AWAY THE FATTESN DCW TS TE 


j 


SAELETON. CONNECTING POINTS AND END POINTS ARE SSESSavec. 


ast 
l 
i 


2 





OUTPUT 








OUTPUT 








OUTPUT 


eee FE gore ee 


wwe popes ect ase Paarl aie 2 mia we 


ee em em NT : 
a ee NS Oe aad 





RELAXATION 


Teo ta a Se a yatetc 
rhe [on ol - et ee | _. ‘ 
de © ‘nN feats + ‘ mf 


SCLUTION IS CESIRABLE. EXAMPLES OF SUCH DRCELENS INCLULE: 
* HEAT FLOW 
* INCOMPRESSABLE FLUID FLOW 


~* ELECTRICAL POTENTIALS 


GIVEN THAT THE BOUNDRY CONDITIONS ARE FIXED, THE yeLJe AT 
EACK POINT IN THE SURFACE CR VOLUME MAY BE ESTIMATES ov 
AVSPAGING THE VALUES OF ITS NEISHBCEINS POINTS. 


¥y 


Jats 





. 7 


EXAMPLE 


- cme em —-— = -, ~~. -, / me, ee Wa ans fenton ie tien _™* ——o- ~~ - 
ia Ee te -.F., i =, 1 ‘ carer’ . . wo. . re oot PY 4 pet 
SSS Les ees I bw Ae oats 
ONE MPa TOA : 
4 -_ -' - ee Be ‘ 
qy oe A Aare e am om 
eg A a ee a Me tee le 





COMPETE AVERAGE OF ACJACENT CELLUS UNTIL VALUES CONVERSE 47 SOUTH 





. -. — aad _—_— -—- ‘ - ee, —. -_ 
WRESE ft 2. Sha = eS an SSCS ce 


Te ee) 





Ce me mem eae ee. me, eran, 
- = - awe i 
bee ak re) wet ee ee “ 





RELAXATION ON THE GAPP 


* 


AN APSAY OF GAPP DEVICES MAY BE USED TO SOLVE A Twit SoMENS HIN AL 
RE_AXATION UTILIZING THE ies NEIGHBOR CONNECTIONS, THE “Ew 
VALUE AT EACH POINT IS SIMPLY THE AVERAGE OF THE YA LES OF LTS 
E'GHT NE'GHBOPS. 

THE PARALLEL ARCHITECTURE OF THE GAPP ALLOWS ALL POINTS TO BE 
CALCULATED SIMULTANEOUSLY. 

THE ALGORITHM 13 FINISHED WHEN THE PREVICUS VALLE OF EvERy Sr. 





"G3SN Juv SdssdNG J9GK dl Wee 
"AINISIDI4IF JAOUdWI OL YISANG 3903 ON SAWNSSVs 





«NO! LV 144u09 

W22 ae Ach 2 = SAL‘ 1 LZ2XL2 AUVNIG ‘9 
CWIIdAL) «NO! LA IOANOD 

W6c1 WZ A8S SNEld GXS LId 8 °S 
CWIIdAL) #NOLLATOANOD 

WL 9E W22 ADO" SABBL exe LId 8 ‘4 
#4dl iid ssavyo 

WEDS WOE ALSS SAL LL exe LI8 BE 

de'l Whcl WI"? SAL’ L2 Ald|LINW LIG Be 

di'tl W126 W8° 82 SNG* 2 gqgv ilg 8 cl 





(Sdd (ddV9_1) 
AVauy | avy 2k x9 


G4SS490Ud GNOIAS Yad STIX Id NOI LV INI W9 





OdVHOT109 ‘SNITIOD 1403 
SOINOYLIATAOUWOIN 





s 


AVIWNOSYSd NONLIVASIAVS Y3W0LSND JWL JM 
1YOddS - ALI TNO - JOLAYSS 





swue\sAs 
UONHONPOsd OW! SUYWIOB]y jo Jajsued] - 
yuewdoleAeg WuyHobiy Ayeq - 
JOSS8001d dd) 84} JO UONEeNeAW - 
senbiuyos) Bulwweboid 
Mau auinbad sAeue GNIS — uoHeonpg . 


yioddns juawdojaaeg ddV5) YON Jo sesodind 





OG¥H0109 ‘SNITI09 1H04 
SOINOYLOATAOVOIN 


FIP 


| 
: 


} — Oaquy 


Xo 
Dh , 


PARIS peers ape gg Mme gp eee guares ““ 


LauddS - ALL IND - ADTAYAS 
ae a en ES Ta I I I BI TE TE Ne EO ET ET TE LE EIDE TOS a ALE ALTE ITT TET LEY 


yoreqpae, JewO}sND — sjonpolg ainjnJ 


}8S pieog SNQIINW - 
‘s}ONpold peuueld 


we\sAS juewudojereq Jd ddV - 

ja|quiassy / JOJRINWIS ddV5 - 

eBbenbuey] wyyobly ddVD — WD - 
‘syONpoOld juaung 


sjonpolig Woddns juswdojeaeg ddVD 





2 ~ GAquU} O0VHO109 ‘SNITI09 L804 
SIINOYLOFTIOUNOIN 


EIR 


ATVIWNOSH4Ad NOILOVASILVS YaWOLSND DVL IM 


LHOddNS - ALI IWAO - ADIAYSS 


aunjons)s a6e10}s pue abesn wyy ul sabueyo Asea smoye 

— SUONEI0] NWH dd) 0} 1a}a4 0) Saweu joquAs jo asn . 
30SS390.1d 

dd) ay) jo aunyeu jeuas—yiq ayy jo aBeyUeApe 32) 0) Buldooj « 


SIOSS900/d ddY*5) 0} enbiun seinjesy oyjoeds . 
ubiseg sieMyos iejnpo- - 
AVWANHONpOldg seWIWWeIBOl, « 


aG6nebue7 |eA97 UBIH ddy5 2 jo sesoding 





lL - W9 O0VYHO1039 ‘SNITIOD 1H04 
SOINOYULIDAITAZOYOIN 





LA iy ee Ln! we “ Bd aes - 


LuUddNS ~ ALITWNO - JOIAYSS 


SECo are veers ENG A bak 





Huidoo] jeias—jiq JO} SJUBUA}E}S JO}, PUE ,J}IUM,, » 

INdINO jEqQo| ddVD BUI 4O sMe}s uo paseq UOINDIXd {CUONIPUOTD » 
‘SaXaPUl SSAIPPE WY ‘UONNIaXxa [BUN IPUOD Joy sHe}} ‘sajqeueA 

Guidaayasnoy,, Jo} — sugyesado Nae pul sayqeeA Jahaju| « 
SUNNOIgNS & 0} passed aq Aeuw Sassauppe WYH 

ddV3 pue suoissaidxa sahayus — AWeyNPOW JO} S|j29 BUNNOIGNS e 

abenbue) pawnjoniys ‘payusio 4OO}G » 

SJUBWAIE}S [ONUGD 2D Ul Pappaquia SUONRINASU! dV} « 


oGnebue] jy5H ey} jo saunjes.j 





Z2-—W9 OOVHO109 'SNITIOD 104 
SOINOYLIATAOYOIW 


EID) 


AVIWNOSYH3d NOVLOV4STLVS YANOLSND JWL 3M 
1YOddAS - ALITWAD - JOIANAS 





ADVWT 


Le 


INV ld LIG WV 


. OWE I, 2 SP UMOLY OS| ‘ssaquuNu 
jo Aese ue ploy 0} pasn sauejd—jiq piyy juacelpe jo uoNoa}09 y « 


‘(Sajqeuen ,Jejes,, ayy ase saiqelsen Jabayut) 
abenbue] WV ay) JO sayqeen ,jayjeved,, ayy ave sajqeuen afeul) » 


soiqeue, ebeuy 





E— Ww OGVUO109 ‘SNITIOD 1404 
SOINOYHLOIIZOYOIN 


BIE. 


as : \ ~: a 2? aa: } Me Pe Sy SEBO {. . 
lauddS - ALI wn - 401AN4S 


4 | . 


‘Q ANjEA ayy SUINJal (\y)azZIS :xy ‘ayqeueA 
abeuw ue ul Sauejd—jig Jo JaqUUNU ay) SUsNJal JOyeJadoO .azIs, aU] « 
: AD=:9 WS=(Fy)WeS 
(ajqeuen Jobat ue sit) = -AD=:0 ws=—"wel ry 
YONOSNASU dy e Ut ajqeuen abewn ue Huisp « 
Ajuo jeqoj6 “Oy abeun 
(SNe Ne) jed0j JO jeqoj6 ‘oy abeun 
‘ajqeueA abeun ue Buiejoag « 
SUONEIO] WYHY ddV 0} adUaJaja1 DIOQUIAS 40} pass) « 


(quod) sejqeue, abeusj 





b- Ww) | O0VHO109 ‘SNITIOD 1404 
SOINOYLOAJTIZOUDIN 


FIP 


AVIWNOSH3Sd NO1ZIVSSIAYS 44dWOLSND AAWL IM 
LYOddNS - ALP IWNO - JOIAYSS 





rA | 


a[Nsay 


ie 


-lUmr 


13432602 ppy 


Oo 


‘(¥seay ye) Jabsey auejd—yiq BuO 
S| 9) pue ‘azis awes aty ale g pue y sajqeven abeul yey) aluNssy e 


SO|qeUeA | | 
aBew OM) ppe 0} euNNNOJqns yous e — ajdwexy 





G- Ww OGVHO103 ‘SNITIOD LWO4 
; SDINOYULIATIOYUOIN 


IRIN. 


! , } ‘ : s | + }. } J ( : Paden —wt 


Ng eM Tye ie? een, b™ Me aes 
LdUddIS - ALI WAD - FDLAYAS 





PADS 1D MS= Mel Ty 

TW OM OT a 

MRI! SU Ty 
Po (stb SCV) @zEs > £ tg = 4) toy 
-Q- are) 


my yur 


12 ‘a fy avnay 


(9 ‘A *¥) PPE 





9—W9 | OdVHO109 ‘SNITIOD LHO4 
SOINOYLOATACYOIN 


eo, 





ATVIWNOSH3d NOLiOVASILVS Y3NG:SND JDWL aM 
LYOddNS = ALI IWAO - JOIAYSS 


_————— nee 


‘wyOBje ajyajdwioo au) WO} 0} paebsaww 19)e| 
@q 0} ‘JUeWUUOJIAUS JoSN—N NW e Ul SeulNOIqns 
10 sajnpow wuUWobje HulbGnqgep 10} injesp - 
‘ejep 19}s1Be1 3q pue WH Jo ,sjoyusdeus,, syndyno - 
‘yndui se 
Oj} ,OP,, YOJEq & JO ‘JESN Wl SPUBLILUOO S}deooy - 
‘XINN Alj@v9eg Zp PUB SWA / XWA JO} a1geyleay - 


Jejquiassy / 10,e|NWIS ddVO — WISdVD 





L—-WiSdVv9D OVUO109 ‘SNITTO9 1403 


SOINOYLOATIZONOIN 


- 


Yo ee 8 is . A DS a rr on | | | ! | 
Lyoddhs z KLUTWn0 7 DIN 


aaa aa a a I a Ie A TE EE A EE EE TT IE LC I EDIT EDIE LN LITT ET IS SE ED RS II TE EE STIG ITE PT 


SPUEWWOD JOJLINWIS JAaYJO JO SUONONISUI ddW JO aouanbas 
© YIM aii} Op, & ayndaxa JO ‘UOHONYSUI ddl) ajGuls e aynoax J » 
Aewe ddV) payeinuis ayy jo azis ayy aunbyuos » 
uaas0S ay} 0) eyep Ja}sibas 3g 10 WYY ddly*) Aeldsig « 
ayy B OUI yep 
WVH 2J0}S JO ‘WYH ddV9 palejnuis oul apy © Wo) Byep peo]. 


70} JOSN BY} MOTje SPUeULOD JO}JeINWIS 


SPUBWUWOD WISdVS 





C-WISdV9 OGV¥O109 ‘SNITIOD 1403 
SOINOYLISTISOUOIN 


CIP) 


ATIWNUSYSd NOTLIVASLiVS WaWwOLSND IAL JM 
1uOddNS ~ ALI WAD - JAYS 


ae a a a RE PSE a a PL a Fe a a ee oD ED a 


OU OP, JOVEINWIS 
e ou! wesboud Wy e aejsued TM 99, Wwesosd sajidwos Wy ay] « 
,Wlod 1G, PUBWIWOD ayy Aq afl} .Op,, ayy Ul Jas aq Aeuws syuiodyeasg » 
‘WAND (ego dV) ayy UO paseq LONNDIXxa |eUONIPUDD ON « 

‘Ppanjosad aq \SNW Saduasaja/ Ssasppe abeut ,ONeWO Ne, 

lly ‘(jewWloapexay Ul) Sassaippe WyYY anjosge YM sojuOWaUU 
JIIQUIBSSE dy) JO ISI] |ehUAaNbas e aq YSN SUONONISU! dd BU] « 


S9f!ld wOP, WISdVD 





E-WISdV9 OGVHO109 ‘SNIT109 1404 
SOINOHLOFITIAOYOIN 





| L \ ! é “WE A wie * pee 1M | H t | 
OP tauddis - ALP wnd - dopaaas | 
Caen n rn rr re rer a TS a sy  a Cs 


“sOVEINWIS ayy BuUUNA KYA JasN—NINW 

jo ane 0} dn waysks juawdojajag sbulsg JOyeJsaja090e auempse} « 
WUSWUOJIAUA Jasn—ajbuis e Ul ‘UONNoaxa 

JEUOIIPUDD ynNdjNO eq Yum ‘swyuobye Jabse) BuibGngap Jo} jnyas¢ e 
SAG O/I Od OF BDepJayUl 

LLL ue ‘sadinap Win}—JaLI09 OM) ‘sadinap gdV¥D OM) YIM aiempJey « 

‘se6Gngep wesbosd Ty) ayajduiog » 

‘spuewwoo Ady ajGuls YIM aoej sul JaSN USALID—NUSy) « 
‘BOd PUe 90d “yId HON Burpnyour 

‘aiqnedwos— jg Wal Aue UO 98/XINJA PU SOG—SW 40} ajqeyieny « 


wa}shs juewdojaAeg Dd dd¥D 





1—-SASdv9) OUVHO109 'SNITI09 1404 
SOINOYLOSTIOUSIN 


aN 





MICROELECTRONICS 
= FORT COLLINS, COLORADO 


i G: a 








EAST-TO-WEST 











GAPP ARRAY 
OF PROCESSOR 
ELEMENTS 
( 32 GAPP DEVICES ) 





NOATH--T0-SOUTH 


PROGRAHABLE WRAP-AROUND 


VIDEO 
OUT 





- MULTIBUS TO HOST 





| | fF WO™ RE OMS * DI ~ a i | | | | 


LYOddIS - ALI IWNO ~ JOLAYAS 
imeem OO EE 


‘wesHold 34jUd ayNIax—a « 
 (,quiodyuq,, yuauaye}s 
ayy Aq aly suelo ay) Ul jas) yulodyea1q e 0} dn aynaaxa « 
‘suononAsSU) dd¥Q Ybnong dais ajGurs « 
Jabingep weshoid Wy5 » 
‘sweshoud Fy 
6unj9a.409 pue Huneasd JO} “10)1Pa YX} UMO S,J9SN 0} BOPPJ\UI NUBY) » 
‘Jayyabo) payusy 
ag i Say ajGHINyY “99 “wesHosd sajidwios Wy 0} aoeRjJa}UI NUayy » 


SoRpa}U] Nua- WaysAs yuswdojeasg Od ddvD 





2—-SASdVD . OdV#0109 ‘SNITIOD 104 
SOINOYLOAITZOUOIN 


FIP 


AVIWNOSH3d NOLLIVISILVS YIWOLSND JAVL JM 
LYOddNS - ALT IWAD - JOIAYSS 


nearer nncacaaamaaamaal 


‘furbueyo Aeue 
eyep ayy ,aas,, pue weiGoud ynoswy) days ued Jasn — pajynaexe 
gJe SUONONSU dd) JeAauayM payepdn Ss} UdaJOS—UO YEP « 
‘wesboud 
e jo Bngep YM UauNIUOS aq UED }Ipa ‘PeojuUMOP ‘peojdn eyep « 
‘ally © 0) JO ‘suaysibiad Jd “WVWY ddVS) 9} EJEp PEO|UMOP » 
‘eyep papeojdn jo ype UIBJOS—UO e 
“aly © Wo eyep Jo ‘eyep Jajsibas 3q “evep WV dV9D peojdn » 
-40}IP9 EYEQ ddV9 « 


(quoo) 
aoejejU) NUeW WaysAs yUasWdoOaAeG Od ddVD 





E—SASdvV9D OGVHO109 'SNIT109 1404 
SOINOYULOJITSONOIW 





CAPPIX I (EVALUATION GAPP) - is a plug compatible processor board that 
interfaces directly to the IBM XT or AT BUS. It is designed to be used as a tool 
to learn GAPP programming and runs in conjunction with NCR "GAPSYS” software 
package. The board contains 144 CPUs and 18432 bits of memory. 


CAPPIX II (INTELLIGENT GAPP) - is a Plug compatible general purpose image 
processing board that interfaces directly to the IBM XT or AT BUS. The board is 
programmable and contains its own on board controller and memory. The total CPUs 
are 144, upgradable to 288 and 36864 bits of memory, plus 4Kx16 Data Ram. 


144 CPUs - 


288 CPUs - 
Software supplied: 


IBM PC Programs: Download Microcode, Download/Upload data, Corner Turn 
Debugger, Full Image Swap 


Microcode Programs: Corner Turn, Arithmetic (+, «, ®, /), GAPP Initialization 
GAPP State Output, Convolution 


GAPSYS SOFTWARE - is a package for writing and debugging programs to be run 
on CAPPIX I hardware. 


LIVING SOFTWARE .- is a software package which simulates GAPP on the IBM 
Personal Computer. Users can build their own application library while utilizing all 
the interactive facilities of Forth language. 


MACRO-META ASSEMBLER FOR MS-DOS 


RELOCATABLE/LINKABLE MACRO-META ASSEMBLER 
FOR MS-DOS 


GAPP Chips available to CompuPix Customers for immediate delivery...Call for Price 
Quotes. 


*GAPP is a Registered Trade Mark of NCR 





§ 4 


sm 


, a 


| ee 








» APPLICATIONS 


© PATTERN RECOGNITION 


Correlation 

Sobel Transform 
Spoke Filter 
Template Matching 
Automated Inspection 


Machine Vision 


@ PARALLEL DATA PROCESSING 


Convolution 


Matrix Operations 


® Histogram 


@ Search and Sort 


s GENERAL DESCRIPTION 


NCR45CG72 


GEOMETRIC ARITHMETIC PARALLEL PROCESSOR 


® IMAGE PROCESSING 


Image Enhancement 

Edge Detection 
2-Dimensional Convolution 
Compression 

Spatial Filtering 
Differential Imaging 


e ASSOCIATIVE PROCESSOR 
* Content Addressable Memory 


® Limit Search 


@ Hamming Distance 


The NCR45CG72 is a two-dimensional systolic array processor chip. It is a mesh-connected six by twelve arrangement of 
1 bit processor elements. Each processor element can communicate with four neighbors: N,E,S, and W. Each processor 
element is composed of a bit serial ALU, 128 X 1 bit RAM and 4 single bit latches: Three latches hold inputs to the ALU 
and the fourth latch allows |/O through the cell without interrupting the ALU, i.e. !/O operations are overlapped with 
computation. 
The cascadeability of the GAPP allows system designers to implement arrays ot processors of arbitrary size in multiples of 
6 X 12 elements. 


s FEATURES 


® CMOS systolic array with 72 processors per chip 
® 6X 12 array of bit serial processor elements 


© Single instruction multiple data stream architecture — all processor elements operate in parallel 


® GAPP devices are fully cascadeable 


@ System throughput increases linearly with number of processor elements in the system 


* Broadcast global input and output 


@ Separate 1/0 bus = overlapped 1/0 and computation 
@ 128 Bits of static RAM per processor 
e VLSI double layer metal CMOS technology 


@ §00 milliwatts power at 10 MHz 


sc ————— 
Copyright ©1984 by NCR Corporation, Dayton, Ohio, USA. All rights reserved. Printed in USA. 


NCR45CG72 
» ABSOLUTE MAXIMUM RATINGS 


Supply Voltage, Vpp..- 1... eee eee ee +7V 
Voltage on any pin with respect 

TO GPOUAG) io sie sos Soe eae aes —0,3 to Vop + 0.3V 
Storage temperature. ............. —65°C to 150°C 
CAUTION 


Stresses above “absolute maximum ratings’ may result 
in damage to the device. Functional operation of devices 
at the “absolute maximum ratings’ or above the recom- 
mended operation conditions stipulated elsewhere in this 
specification is not implied. 


1. CMOS Devices are damaged by high energy electrostatic discharge. Devices must be stored in conductive foam or with 
all pins shunted, Precautions should be taken to avoid application of voltages higher than the maximum rating. 


2. Remove power before insertion or removal of this device. 


s OPERATING CHARACTERISTICS 


Supply Voltage 
Supply Current (10 pF loads) 


45CG72-2 
45CG72-1 


Input Low Voltage 
Input High Voltage 


VI 
vi 
Vo 
Vo 

z 
Cin 
Co 


Leakage Current on any 
Input or 1/0 Pin 





NCR has a license from Martin Marietta Aerospace to manufacture and market GAPP devices only for commercial and industrial applications. 
GAPP devices may not be sold by NCR to the military market and may not be incorporated into equipment for the military market without 
authorization from Martin Mariette Aerospace, Orlando, Floris. “Military Market” shail mean the market defined by procurements of 
product made directly or indirectly for the U.S. Department of Defense or any other U.S. Government agency or any foreign governments, 
for use in equipment intended for military application and, technically characterized for such application by construction, extreme environ- 
ment capability, electronic circuit adaptations for specifically designed military equipment, and or being type designated by any legaily 
authorized government or joint government-industry body, which can confer such designations. 


Henne eee ere 
2 NCR reserves the right to make any changes or discontinue altogether without notice with respect to any hardwere or software product or the 
technical content herein. 





NCR45CG72 
» TIMING DIAGRAM 
tey 
tou 
Vin = 2.0V 
CLOCK Vie = 0.8V 


re 
wt i 
rc 


EI ms i. Ne N= VO 
wens. Le Xe) IN) 
Go tT vac 2 ee 
a ake 
we TT} a) {a} (a) 


NOTE: 1,2,3 refer to the staging sequence of instruction, data in and data out. 


» AC CHARACTERISTICS 













MAX UNITS 
SS a Fs Ta ET CM a [Ea A 
euoek cow tee 10 s000 [06008 os 
eLoeK nigh ten | 100 S00 [805000107 [os 
Fn ag RE” RRMA 1 Se MT 
Sg Ta RN OS (a ST (7S 
COUTeUTSIENABLEDI ee og een 
Glos OUTPUTLOW! = eae eo fou |e 0 
GLOBAL OUTPUTTAISTATE | tcor | 10 | 50] 10 | 3 | = | 
meuourr: = tee ao |e) 





NOTE: (1) d.c. by design; tested at 5 jisec. 








NCR45CG72 
» PROCESSOR ELEMENT AND DATA BUS IDENTIFICATION 


Weo 


TOP VIEW OF PACKAGE 


F = S z - = 
= 8 = 5S = 8 > $8 > 3 = 8 
oO a oOo z oOo z Oo 2 oOo 2 Oz 


~d 
i=) 
. 
— 
~ 
s 
e 
Ww > 
Ps 
~d 


Oo 
e 
wW 


on 


CMSao 
Spo 
CMSp1 
Sai 
CMS B4 
Spa 
CMS a5 
Sas 








Eos 

Eis 

Eos GLOBAL 
CONNECTIONS 
TO EVERY 

E35 PROCESSOR 
ELEMENT 
IN THE ARRAY 

Eas 


Control Lines 
Co -Cr 









RAM Address 
RAg - RAg 






Giobal Output 
GO 





NOTE: This numbering scheme may be extended in systems which contain more than one GAPP device. 


PIN LABELS 

Woo — Weo WEST DATA BUS 
Eos — Eps EAST DATA BUS 
Noo — Nos NORTH DATA BUS 
Sao — Ses SOUTH DATA BUS 
CMSgo — CMSps5 INPUT BUS 
CMNoo — CMNos OUTPUT BUS 
RAg — RAg RAM ADDRESS BUS 
Co —Co CONTROL LINES 

— INSTRUCTION BUS 

— GLOBAL DATA INPUT BUS 
GO GLOBAL OUTPUT LINE 





—S 2 co 


—=—Ss Ss 


—3 








NCR45CG72 
« BLOCK DIAGRAM OF CONNECTIONS BETWEEN 
FOUR PROCESSOR ELEMENTS 
[Noo | Jems} | Nox 
A Bidirectional 
AY cee hx 
«| | cs eet 
== eT eg ce iceee OEE 
EW GLOBAL 






+f ae 
CMS N/S 


1) 72-input 
N/S 


i | or gate 
Fe a <CEiiroee PeRe s ealel  .1 


i 
“ a 
CMS N/S lk 


OE = Output Enable is an internal connection. 

East Outputs enabled whenever EW:=W 

West Outputs enabled whenever EW:=E 

North Outputs enabled whenever NS:=S 

South Outputs enabled whenever NS:=N 

GO is pulled iow whenever any NS register contains 1 





NCR45CG72 


* SCHEMATIC DIAGRAM OF ONE PROCESSOR ELEMENT 










CONTROL 
LINES 
MULTI- 
Co PLEXORS 












Cy REGISTERS 





Ya 


Ag Ay Az AzgAag F 
ADDRESS LINES 











a 


NS = NS Register 

EW = EW Register 

C = Carry Register 

CM = Communications Register 

CMS = Communications South Input | 
CMN = Communication North Output 
SM = Sum 

BW = Borrow 7 
CY = Carry 

GO = Globat Output 


- 
q... 





® INSTRUCTION SET 


READ 
RAM: =CM 
RAM: =C 
RAM. = SM 


-co Ox * xk K KM MK KM I[K MM KK KK LK KKK KK KK 


_ 


=o 7201%% XX KM KK KIRK KK MK KK LK KK RK MK KK 


=-=---4+- 20 00 OFM KKM MK KM MILK KKK MK KCK 


s ARITHMETIC OPERATIONS 


Adder/Subtracter Operations 





oO 











~-oO o}/-+00 
~e Re Ss OOCOO0O 
-oOo .- Oo 
—~ ww SP OF OO EC 





= oj7/ 0 - 0-0 








INPUT OUTPUT 


a Owe OO Oo 


-2-~ oof 001K MK K KM KK [K KK KK KM KK 


NCR45CG72 


Control Lines Description 
Cg C7 Ce Cy Ca Cz Co Cy Cy 


MICRO-NOP 
LOAD CM FROM RAM 


MOVE FROM CMS 
INTO CM 


LOAD BINTO CM 


MICRO-NOP 

LOAD NS FROM RAM 
MOVE FROM N INTO NS 
MOVE FROM S INTONS 
MOVE FROM EW INTO NS 
MOVE FROM C INTO NS 
LOAD OINTO NS 


MICRO-NOP 

LOAD EW FROM RAM 
MOVE FROME INTO EW 
MOVE FROM W INTO EW 
MOVE FROM NS INTO EW 
MOVE FROM C INTO EW 
LOAD @iINTO EW 


~-~- - oscsdoTx 


x 
x 
x 
x 
x 
x 
x 
x 
0 
0 
8) 
0 
1 

1 

1 


~o ofr2 0 01K K KX KK XK KX! x 


MICRO-NOP 

LOAD C FROM RAM 
MOVE FROM NS INTO C 
MOVE FROM EW INTO C 
LOAD C FROM CARRY 
LOAD C FROM BORROW 
LOAD @INTOC 

LOAD TINTOC 


~o-7-o=- 0-4 OK K KK MK KK EK MK KK MK MI OK 





READ FROM RAM 
LOAD RAM FROM CM 
LOAD RAM FROM C 
LOAD RAM FROM SUM 


xx xx |x KM KM KK KEK MK KK KK fo - OF OH OTK 
xx x xXx IK MK KKK MK MLK KM KKK KKK KK KK 
xx«x x x«xIKX MM KKM KK LK KK KK MR MLK KK KK KK 


x 
x 
x 
Xx 
x 
x 
X 
x 
Q 
1 

0 
4 

a] 
1 

0 
x 
x 
x 
x 
x 
x 
x 
x 
x 
x 
x 
x 


xx «KK LK KK KK KK OK 
xx x xX |K KKK KM KK 
x «x «KK TK KKK KK KM MI KK KK KK OK 
x x «x «KIK KKM KK KM KK KM MK OK OK 





# LOGIC OPERATIONS 


LOGICAL 
OPERATION DESCRIPTION CONDITIONS 


EW=0,C=1 
NS=0,C=1 
NS = 0, EW=1 






















CY = NSeEW 
CY=EWeC 
CY=NSeC 
BW = NSeEW 


cy = NS + EW 
BW =NS + EW 
BW=EW+C 
SM=NS@C 


SM = NS @ EW 
SM=EW@C 













XOR 


XNOR 





7 





NCR45CG72 


« TABLE OF PIN NUMBERS VS. SIGNAL LABELS 
(CERAMIC PACKAGE) 





N.C. = No Connection 


——; 





{— 
t 
| 
ae 
| of 
h \e 
Fog 
ae 


= a 
i 
cca eae | coOr=a 





NCR45CG72 
# CERAMIC PIN GRID ARRAY PACKAGE 


BOTTOM VIEW 


1.060 + .010SQ 


pin Al indicator 


DOODOdD OD. 
LOOSE 
50000. 


SS TITS 






: 0.050 " 
+0.010 
| 0.080 + 0.008 
0.150 + 0.010 
SIDE VIEW 





NCR45CG72 | 


© TABLE OF PIN NUMBERS VS. SIGNAL LABELS | 
(PLASTIC CHIP CARRIER) 


ono nz Mm mH B&F WwW DY = 


= = = 
NR 6S 


13 





NC = No Connection 


10 








; NCR45CG72 






lee ® PLASTIC CHIP CARRIER PACKAGE 
1.190 
sa. 
a 1.153 
SQ. 
, 0.045 X 45° 
_ , 0.576 CHAMFER 
: 0.450 
aan 32 12 
0.010 X 45° SE SISSIES SSCS TRIRESIRISIsturers 4 
, CHAMFER 33 11 


o-_, 
os 


ai 


~- 
BSEGESERE Bias aiaitaetebsEaRatateseatstanee 


{ 
2 
Oo 
Tal 
a) 
on 
6) 
a 


74 
| 

| TOP VIEW PIN 1 

‘ INDICATOR 


0.045 X 45° 


: | a CHAM 


{et ai 


0.107 
0.150 
1120+] 
i) 
SIDE VIEW 
' a Ail dimensions are in inches 
v= 11 








NCR45CG72 


« TABLE OF SIGNAL NAMES VS. PIN NUMBERS 
(CERAMIC AND PLASTIC) 


SIGNAL PLA SIGNAL PLA SIGNAL 
NAMES PIN NAMES PIN NAMES PIN 
14 Nog K7 | 60 Cy J9 52 


INICIR, 


B3 
C4 
A4 
Ag 
AB 
B8 
-K1 
J4 
J5 
H6 
J? 
K8 
G8 


H9 
G9 
F9 





DEVICES TESTED WITH THIS 
OUTPUT LOAD CONFIGURATION 


DUT 
All outputs 
except GO 


Ry = 2.3K92 


DUT 
Global 


Output (GO) C, = 40pF 


Open drain output on GO allows up to “= 
4 devices to be connected together. 





NCR Microelectronics Division 2001 Danfietd Ce. Fort Collins, Colorado 80525 
Telex: 645-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 





NCR45CT6 





GAPP”CORNER—TURN BUFFER 
(Word Serial/Bit Parallel to Word Paralle!/Bit Serial Shift Register) 


— | YR 


® APPLICATIONS 


® Two Dimensional Array Data Formatter 
® Conversion of data from word serial/bit paralle! format to word parallel/bit serial or vice versa 
® Buffer for input data to a GAPP ™ array 


\ 
i ed —— 


= GENERAL DESCRIPTION 


—_ Oe 


The NCR45CTG is a two-dimensional array of shift registers. It is a 6 x 12 arrangement of register groups with 5 registers 
per group. Data can be independently shifted in east-west and north-south directions through latches called EW and NS or 
stored in jocal registers called C and R. An additional register path in the N-S direction calied CM is unidirectional, shifting 
south to north only, The NS and EW paths can shift data bidirectionally. 


The NCR45CT6 is a shift register device that allows data to be input into a GAPP array in bit-serial format from data 
sources whose outputs are in word format (such as A/D converter). The 45CT6 devices can be configured to buffer a string 
of data words from an A/D converter, for example. Once a full line of data is stored it is then shifted into the GAPP array in 
bit serial format. This is achieved by shifting the LSB of each word from the 45CT6 line buffer into the GAPP array in 
parallel, by storing these bits in RAM within the GAPP array, and subsequently shifting increasingly significant bits from the 
45CT6 line buffer into the GAPP array. in real time video applications this shifting of data into the GAPP array can take 
place during the horizontal retrace interval (refer to GAPP Application Note #1 for a suggested system implementation). 


am >, ele 


es FEATURES 


CMOS register array with 360 registers 

6 x 12 array of single bit register groups, each containing 5 registers 

Single instruction muttiple data stream architecture — all register groups operate in parallel, 
Devices are cascadeable in two dimensions 


Register clear and set capability 
Compatible with NCR45CG72 Geometric Arithmetic Parallel Processor chip 


GAPP™ is a trademark of NCR Corporation 
Copyright © 1985 by NCR Corporation, Dayton, Ohio, USA. All rights reserved. Printed in USA. June 1985 


| 
eel eee ee ee oe oo 


~- 


9; 





NCR45CT6 
e ABSOLUTE MAXIMUM RATINGS 


Supply Voltage, Vpp..-. 2-2... 2. ee eee +7V 
Voltage on any pin with respect 

TO GhOUNG.. «2.5 8 da eeawa wena es —0.3 to Vop + 0.3V 
Storage temperature.............. —65°C to 150°C 
CAUTION 


Stresses above “absolute maximum ratings” may result 
in damage to the device. Functional operation of devices 
at the “absolute maximum ratings’ or above the recom- 
mended operation conditions stipulated elsewhere in this 
specification is not implied. 


1, CMOS Devices are damaged by high energy electrostatic discharge. Devices must be stored in conductive foam or with 
all pins shunted. Precautions should be taken to avoid application of voltages higher than the maximum rating. 


2. Remove power before insertion or removal of this device. 


e OPERATING CHARACTERISTICS 


[Supply Current WOeF loads) | top 
5 
0 
A 
ie) 


D 
L 
[| Output High Voltage (lon = 1mA) | Von 
| Temperature LT 
[Output Capacitance | 


Leakage Current on any 
input or 1/0 Pin 





NCR has a license from Martin Marietta Aerospece to menufecture and market GAPP corner turn devices only for commercial and indus- 
triat applications. GAPP corner turn devices may not be sold by NCR to the military market snd may not be incorporated into equipment 
for the military market without authorization from Martin Merietta Aeroaspece, Oriendo, Florida. “Military Merket" shall meen the market 
detined by procurements of product mede directly for the U.6. Department of Defense or any other U.S. Government agency or any foreign 
governments, for use in equipment intended for military eppllcstion end, technicelly cherecterized for such application by construction, 
extreme environment capability, electronic circult adaptations for specifically designed military equipment, end or being type designated by 
any legally euthorized government or joint government-industry body, which can confer such designations. 


a 
2 NCR reserves the right to make any changes or discontinue eltogether without notice with respect to any hardware or software product or the 


technical content herein. 


r » | 
“= “mm ° hi 
CORN paren esrnrnenette reece ce en A a CT CC TC 


Re en Ol ee ee ee ee! ee 


—— eh 
en CO OO OU aa 





NCR45CT6 
® TIMING DIAGRAM 


& 
< 
s 
2 


ty 


VID N= OO 
ioenoenie 


INPUTS 
A——_ 7 
g eg 
8 
me ee dt 
alle ee 


Ae |. 
& 


HM 


Peackcall 


a eee 
= 
N 


NOTE: 1,2,3 refer to the staging sequence of instruction, data in and data out 


s AC CHARACTERISTICS 











CserueTME ed 
oe 
<a 
a 
Temvoureut dt [720] 


NOTE: {1) d.c. by design; tested at 5 jssec. 








NCR45CT6 


s REGISTER GROUP AND DATA BUS IDENTIFICATION 


Wac 


Wao 


CMNogo 


»Noo 


~ 


CMSgp 
Sgo 


CMSa1 


TOP VIEW OF PACKAGE 


= 
o 
W 


m 
>» 
uo 


2 2 
= 4 = & 
Oo 2 Oo z 
Er 
E15 
Eos GLOBAL 
CONNECTIONS 
TO EVERY 
E35, PROCESSOR 
ELEMENT 
IN THE ARRAY: 












Control Lines 
Co -Cc 





i. 
-_ 
wd 
= 
~ 
e 
~~ 
Gy 
ond 
m 
J 
am 


rs 


CMS g5 
Ses 


NOTE: This numbering scheme may be extended in systems which contain more than one GAPP corner turn device. 


PIN LABELS 
Woo — Wao WEST DATA BUS 
Eos — Eas EAST DATA BUS 
Noo — Nos NORTH DATA BUS 
Seo — Ses SOUTH DATA BUS 
CMSap — CMSgs INPUT BUS 
CMNog — CMNos OUTPUT BUS 
M5 Re CONINSTRUCTION BUS 





— GLOBAL DATA INPUT 








mote Wee 


—_——e er ee ee eR 


| 


| st =e 
| | 





NCR45CT6 


=» BLOCK DIAGRAM OF CONNECTIONS BETWEEN 
FOUR REGISTER GROUPS 


ie re 
Bidirectional 
£ /\ \/ Noninverting Ls /\ VV 
t/O Buffer 


OF OE fs 
CMN 
NS‘ 
ae tet esel le oe 






CMS 


E 
> EW. 
Wo Fe a Es 2a 
isa one 
OE cms N/S : 
; : ' co 
| t { 
OE = Output Enable is an internal connection. 
East Outputs enabled whenever EW:=W 
West Outputs enabled whenever EW:=E 


North Outputs enabled whenever NS:=S 
South Outputs enabled whenever NS:=N 





NCR45CT6 
* SCHEMATIC DIAGRAM OF ONE REGISTER GROUP 






PLEXORS 


1 
—— at 





REGISTERS 


r 
}- 


TS a ae 


CM = Communications Register 
CMS = Communications South input 
MN = Communication North Output _ 





ia 


aes 


(ans 


=~— 


— 
Saal 


iam | ~~ | 


Cae mene. | gee] 
Se Es 


® INSTRUCTION SET 


=o Oo 7K wx KKK KL KKK WM WK KK 


o- 07K xX KKK MILK KK MK KM 


x KK MK KR x 





NCR45CT6 


MICRO-NOP 
LOAD CM FROM R 


MOVE FROM CMS 
INTO CM 


LOAD BINTO CM 


MICRO-NOP 

LOAD NS FROM R 

MOVE FROM N INTO NS 
MOVE FROM S INTO NS 
MOVE FROM EW INTO NS 
MOVE FROM C INTO NS 
LOAD BINTONS 


MICRO-NOP 

LOAD EW FROM R 

MOVE FROM E INTO EW 
MOVE FROM W INTO EW 
MOVE FROM NS INTO EW 
MOVE FROM C INTO EW 
LOAD 6 INTO EW 


MICRO-NOP 

LOAD C FROM R 

MOVE FROM NS INTOC 
MOVE FROM EW INTOC 
LOAD BINTOC 

LOAD t INTOC 


MICRO-NOP 
LOAD R FROM CM 
LOAD R FROM C 








NCR45CT6 


« TABLE OF PIN NUMBERS VS. SIGNAL LABELS 
(CERAMIC PACKAGE) 


N.C. = No Connection Note: All Veg pins must be connected to ground. 


CMNo1 
CMNo2 

No3 

CMNo4 
Test Output 
C7 

Ess 

CMNoo 


Vss 
Vss 
Vss 








b i NCR45CT6 
=» CERAMIC PIN GRID ARRAY PACKAGE 


a BOTTOM VIEW 


y 


LINE 


a . as, <p as 





0.080 + 0.008 


+ 






0.150 + 0.010 


SIDE VIEW 








NCR45CT6 | 


= TABLE OF PIN NUMBERS VS. SIGNAL LABELS r 
(PLASTIC CHIP CARRIER) ot 


VOD aun} 


Test Output 
Cg 
Cs 


Sees ck 





NC = No Connection Note: All Vgs connections must be connected to ground 


10 sc 


~ 





NCR45CT6 
® PLASTIC CHIP CARRIER PACKAGE 





1.190 
sa. 
1.153 
sa. 
0.045 X 45° 
0.576 CHAMFER 
0.010 X 45° : 
CHAMFER 33 ee | 
a 
u 
i 
5 60.576 
a 
B 
u 
| 
8 
Sj 4} ——_—__--_- 
D 84 
| 
a 
] 
a 
| 
0.072 LB 
8 
0.093 | = I 75 | 
0.050 
| TOP VIEW PIN 1 
{INDICATOR 
0.045 X 45° 
cameo | earn ] CHAM 





0.107 0-035 | 
0.150 
_—— er 


SIDE VIEW 
All dimensions are in inches 








NCR45CT6 


= TABLE OF SIGNAL NAMES VS. PIN NUMBERS 
(CERAMIC AND PLASTIC) 





Note: All Vgg pins must be connected to ground 


DEVICES TESTED WITH THIS 
OUTPUT LOAD CONFIGURATION 


DUT 
All outputs 


L 





NCR Microelectronics Division 2001 Danfield Ct. Fort Collins, Colorado 80525 
Telex: 045-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 


12 


—s 


es 


S$ 








NCR45GDS1 


GAPP™ PC DEVELOPMENT SYSTEM 


GENERAL DESCRIPTION 


The GAPP PC Development System is composed of two parts. The first is a hardware board which is compatible with the 
IBM-PC 1/O bus and contains a 12 by 12 array of processor elements implemented with two GAPP devices. The second part 
is a software package which allows the user to program the GAPP array in a high-level language and interactively debug a 


program. 


HARDWARE FEATURES 


12 by 12 array of GAPP Processor Elements (PE). 

12 byte reformatting/corner-turn array for data input/output. 

Interface to [BM standard bus with TTL circuitry. 

Three 8-bit registers for GAPP control and address interface. 

Two I/O ports for data down-load/up-load to or from GAPP array. 

Register and port addresses are switch selected. 

PE array clock is software controlled through separate |/O port. 

Printed circuit board plugs into bus connector of IBM compatible personal computers. 


Cylindrical wrap: all East and West I/O lines are horizontally connected at the left and right edges of the array; all 
North and South I/O lines are vertically connected at the top and bottom edges of the array. 


SOFTWARE PACKAGE FEATURES 


Menu driven with screen oriented displays 
GAL™ (GAPP Algorithm Language) compiler. 
Simple text editor for program corrections. 


Debug routines allow user to: 
single step through GAPP instructions, 
execute an entire block of GAL program statements, 
execute entire program, 
stop at any time for program corrections/re-compilation. 


GAPP PE editor aliows user to: 
up-load/change/down-4oad contents of each PE RAM, 
up-oad/change/down-load contents of PE registers, 
store or load any of the above data to/from a data file. 
data files can be edited with the text editor. 


Runs under the VENIX/86™ (NCR45GDS1-VX) operating system on any NCR Model 4 (with hard disk) or IBM PC-xT™ 
compatible. Also available in an MS-DOS™ version (NCR45GDS1-MS) for BM compatible personal computers. 


NCR reserves the right to meke any changes or discontinue altogether without notice with respect to eny herdwere or software product or the 
technical content hersin. 


Copyright © 1885 by NCR Corporation, Dayton, Ohio, USA 





NCR45GDS1 


FEATURES OF GAL 


The GAPP Algorithm Language is a subset of the C programming language with several features added to tailor the language 
to the GAPP. Features of the C programming language which have been implemented are: 


All arithmetic, logical, and assignment operators. 

int variables, can contain the values from —32768 to 32767. 

Variables defined inside of a block (within { }‘s } are automatic (storage space can be re-used outside of the block). 
The #, if... else, for, and while program statements are implemented. 


Support for subroutines and int functions is provided. Arguments to the subroutine are the values of the variables which 
are used in the subroutine call. The values of these variables are not changed by the subroutine. 


Additiona! features have been added which are unique to GAL: 


A new type of variable is used to refer to GAPP RAM addresses. An image variable is used to refer to a set of adjacent 
RAM locations starting at address X, with n number of bits, Image variables are declared by one of the following program 
statements: 

image SCRATCH :3:7; 

image SCRATCH : 5; 
The first form defines an image named “SCRATCH” which starts at RAM address 3 and ends at RAM address 7. The sec- 
ond form also defines an image name “SCRATCH,” but only specifies the number of bits (5 ). The starting address of 
the image is left up to the GAL compiler. 
image names can be used to specify the address portion of s GAPP instruction. The programmer must specify the name 


of the image and an arithmetic expression which gives the offset within the image. The compiler adds the starting address 
of the image to the arithmetic expression to determine the GAPP RAM address. An example is 


SCRATCH :i+3 


Either the image name, or the arithmetic expression may be omitted, but not both. If the expression is omitted, the com- 
piler uses O for the offset; if the image name is omitted, the compiler uses the expression for the address. 


The function size{ } is built into the compiler; the function accepts the name of an image as an argument and returns the 
number of bits in the image. 


A legal GAL program statement is a GAPP instruction made up of GAPP RAM address and a tist of GAPP assembler 
mnemonics. Exampies of GAPP instructions are: 


X:iew: "cram: "cy; 

ram (X:4}: 2c; 

ew :=ram(:2); 
in addition to using int variables as arguments to subroutine calls, the names of images may also be used. Both the start- 
ing address and the size of the image are put on the argument stack for the subroutine to use. 


The status of the Global Output pin can be used as a criteria for conditional execution of GAL program statements. This 
is accomplished by the program statements: 


if(goset) 

if(gocir) 

if(goset). . . else 

if(gocir). . . else 

while(goset) 

while(gocir) 

for {.. .; goset;. . .) 

for (. . .:gocir: ..)} 
The term goclr is non-zero (true) if the Globe) Output is low {one or more NS register contains a 1). The term goset is 
non-zero (true) if the Global Output is high (all NS registers contain 0). 


™GAPP and GAL are trademarks of NCR Corporation. 
™ VENIX/86 is a trademark of VenturCom, Inc. 
™ MS-DOS is a trademark of Microsoft Corp. 








NCR45GDS1 


=» USING THE GAPP PC DEVELOPMENT SYSTEM 


a... Taam 


a) Displaying the main menu .. . 







Copyright 1985 by NCR Corp. Dayton, Ohio, USA. All rights reserved. 


NCR GAPP PC Development System version 1.0 
GAL release 2.0. Hardware module version 1.1. 

Compile GAL prograsp 

Debug progran 

Gxecute progran 

GAPP BAM editor 

Edit program {vi) 

Ipitielize (clear) GAPP array 


Quit 


b) Reading from GAPP RAM... 
Copyright 1985 by NCR Corp. Dayton, Ohio, USA. All rights reserved. 


NCR GAPP PC Development System version 1.0 
GAL release 2.0. Hardware module version 1.1. 


4 


6 


a + 
eeoovoe0ccecoes 


@eoecoceceorcec 

eoooccvdocecosso 
qgooeocececos 
ecoeooecoeocceseo 


GAPP RAM addresses 16 to 23 unsigned decival dieplay 
Bownloed [EG] Change $tore te file Display mode 


c}) Debugging a GAPP program ... 
Copyright 1985 by NCR Corp. Dayton, Obio, USA. All rights reserved. 


NCR GAPP PC Development System version 1.0 
GAL release 2.0, Haraware module version 1.1. 


Tee :F en 


1 © rt cy 
@%) carr instr Bieck (FF) Pregran (Bf wan ocit (%) seit (vid GX) abort 


$6 | 


ee ee ee 
ie 


- 


d) Flagging a program error... 


“edge, gal" 
"edge. gal” 


e) Entering editor to correct program... 
@eain{) 


{ 


, 4 ¢ Ba 
= em, 
ew :: rem: 
Tenp :- en 


Z:i 0 ewis:t rep se 


for (i = 0; a2 << B; i 


line 19; syntax error. 
line 23: symtax error. 


Bit RETURN to continue 





ot) { 
/* Get bit from operand ¥ 8/ 
/@ Get bit from operand ¥ ¢/ 


© ost ey; ¢@ add bite and store in result 


1= © ers 3 


oe) { 


Z:i + 1 ew: rem c :2 ey; 


2:4 vem :t 


} 


© e:= 3; 


“edge.gel” 24 lines, 378 characters 





NCR Microelectronics Division 2001 Denfield Ct. Fort Collins, Colorado 80525 


Telex: 045-4506 NCRMICRO FTCN 3 Phone: 303/226-0500 303/223-5100 


NCR45GDS1 


one ~~. “ae 


a we 








NCR45GS2 


GAPP™ SIMULATOR/ASSEMBLER 


s GENERAL DESCRIPTION 


The GAPP Simulator/Assembler package is composed of two utilities which operate under the UNIX ™ (NCR45GS2-UX) 
or VAX/VMS'™ (NCR45GS2-VM}) operating systems. The first utility is the assembler, GAPASM, which translates GAPP 
instruction mnemonics and address specifications into binary object code suitable for down-loading into a control state. 
The simulator, GAPSIM, is an interactive package which enables the user to “execute’’ GAPP programs, and view the con- 
tents of GAPP RAM and the state of the Processor Element registers in order to verify and/or debug the program. 


™GAPP is a trademark of NCR Corporation 
™UNIX is a trademark of AT&T Bell Laboratories 
™VAX and VMS are trademarks of Digital Equipment Corp. 


# ASSEMBLER 


The assembler is invoked with a command of the form: 
gapasm [ -o outfiie | [ filename } 

where “‘outfile’’ is the output filename and “filename’’ is the input filename. The square brackets ( [ ]’s ) indicate that both 
the input and output filenames are optional, If used, the input filename must end with “.asm’’, and if the input filename is 
not given, the standard input is assumed (terminal console, or whatever was specified using the “<"' re-direction feature of 
UNIX). !# the standard input is the console, then the user may type GAPP assembler instructions and the assembler wil! 
interactively return the GAPP object code values. If the output filename is not given, the assembler output goes to a file of 
the same name as the input filename, but ending with “.gap’’. The “’-o” flag can be used to direct the output to any filename, 
which does not have to end with “.gap”. 


The assembler assumes an input format for GAPP assembly instructions: 

© GAPP RAM address must start in column 0, and be given in hexadecimal notation (allowed values are 0 to 7F hex). 
@ Spaces or tabs are used to separate the RAM address from the GAPP instruction field. 

e A GAPP instruction field is composed of up to five micro-instruction fields, each separated by spaces or tabs. 


© A GAPP micro-instruction field is composed of a destination mnemonic and a source mnemonic, separated by ‘=’ or 
:=' Valid destination and source mnemonics are: ew, ns, cm, c, and ram. Additional valid source mnemonics are: n, 
$s, &, W, Cy, carry, bw, borrow, sm, and plus. cy and carry have the same meaning; the same is true for bw and borrow, 
and sm and plus. Not ali combinations of source and destination mnemonics are legal GAPP instructions; see the GAPP 
(NCR45CG72)} data sheet for details. 


® Comments start with a semicolon {;) and continue to the end of the line. 


ver 


The assembler checks for the following errors: 

® Invalid destination or source mnemonics. 

e Invalid combination of source and destination mnemonics (illegal GAPP instruction). 
® Conflicting micro-instruction mnemonics (e.g. nNs*ew ns=ram). 

@ Invalid GAPP RAM addresses. 


Any errors cause an error message, but processing of the input file continues, GAPASM exits with a return code of 0 if no 
errors occurred, otherwise it exits with a return code of 1. Any error messages go to the standard error output. 


NCR reserves the right to make any changes or discontinue altogether without notice with respect to any hardware or software product or the 
technical content herein. 


Copyright © 1985 by NCR Corporation, Dayton, Ohio, USA 


NCR45GS2 


No filename of image to load - Occurs during a “store’’ command. No filename was given with the “store’’ command. 


Invalid stack frame to load - Occurs during a “‘load’’ command, The RAM frame was not given with the “load’’ command, 
Or it was not between 0 and 15. 


File write error - Occurs during a “store’’ command. The command could not be completed because of a system I/O error. 
The system error message is printed. See the appropriate operating system reference manual for explanation of the 
system error message. 


File read error - Occurs during a “load” or “do” command. The command could not be completed because of a system |/O 
error. The system error message is printed. See the appropriate operating system reference manual for explanation of 
the system error message. 


No filename to perform - Occurs during a “‘do’’ command. No filename was given with the “‘do”’ command, 


Invalid index into stack - Occurs with the “rgrid’’ command. The RAM frame number was not given with the “rgrid’’ com- 
mand or was not between 0 and 15. 


myshel: fork failed - An error occured while trying to execute a UNIX command entered using the “ | ‘ feature. See the 
“fork{2)” entry in the UNIX Reference Manual for explanation, 


Other error messages not listed above are the result of UNIX or VAX/VMS system errors, and an appropriate system error 
message will be printed. The operating system reference manuals should be consulted, 


NCR Microelectronics Division 2001 Danfield Ct. Fort Collins, Colorado 80525 
| Telex: 045-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 


4 





NCR45GDS1 
Main Menu INSERT 


C-Compite GAL™ Program 
D - Debug/Execute GAL Program 
P - Edit Program 

t- Initialize (Clear) GAPP Array 
S - System Configuration 


X - Temporary Exit to Execute 
System Command 


Q - Exit GAPSYS ™ 


















Editor 
(Program File} 








Debug Sub-menu Configuration Menu 





E - Specify Editor Pathname 


t - Specify Maximum Number of 
Instruction Cycles (Designed 
Timeout} for GAL Program 


A - Add Fite to Subroutine Library 
List 


G - Execute Single GAPP Instruction 

B - Execute Program to Breakpoint 

F - Execute to End of Program 

P - Edit Program “pe Editor 
(Program F ite} 


D - Delete File from Subroutine 
Library List 


Q - Return to Main Menu 


U - Upload Data from GAPP RAM 

R - Upioad Data from GAPP Register 
tL - Load Data from File 

D - Edit Data File “p" Editor 


Q - Return to Main Menu (Data Fite} 





“uu” iol Diet ae " 
; R 


Register Seiect 


C - Display C Register 

M - Disptay CM Register 
N - Display NS Register 
E - Display EW Register 
A - Disptay All Registers 





hed Ohad “MA” “wh” od he “A” 
’ . ’ 


Display Data Sub-menu 


G - Execute Single GAPP Instruction 

B - Execute Program to Breakpoint 

F - Execute to End of Program 

P - Edit Program “pr 


Editor 


H,J,K.L - Cursor Movement within (Program File} 


Data Display 





D - Download Data to GAPP Array 


S - Store Data to File (Specify 
Filename) 


C - Change Number Base (Hex or 
Decimal} “Q" 


Q- Return to Debug Menu 





GAPSYS Interactive Menu Structure 


Qe PP PS re, 
Copyright © 1985 by NCR Corporation, Dayton, Ohio, USA June, 1985 


NCR45GD$S1 





E/W WRAP AROUND 








a 


CONTROL REGISTER 






N/S WRAP AROUND 









12X12 
GAPP™ PROCESSOR 


ARRAY 
(2 GAPP CHIPS) 
READ DATA 
CORNER TURN LINE BUFFER 





ADDRESS REGISTER 






IBM PC I/O DATA BUS 


(2 NCR45CT6 CHIPS) 





WRITE DATA REGISTER 


BLOCK DIAGRAM. GAPP PC DEVELOPMENT SYSTEM 


IN|C IR) NCR Microelectronics Division 2001 Danfield Ct. Fort Collins, Colorado 80525-2998 
Telex: 045-4505 NCAMICROFTCN Phone: 303/226-9500 303/223-5100 


GAPP, GAL, and GAPSYS are Trademarks of NCR Corporation 





NCR45GDS1 
Main Menu INSERT 


C-Compite GAL™ Program 

D - Debug/Execute GAL Program 
P - Edit Program 

I - Initialize (Clear) GAPP Array 
S - System Configuration 


X - Temporary Exit to Execute 
System Command 


QO - Exit GAPSYS ™ 














Editor 


Debug Sub-menu (Program File) 









E - Specify Editor Pathname 


| - Specify Maximum Number of 
Instruction Cycies (Designed 
Timeout} for GAL Program 

A> Add Fite to Subroutine Library 

List 

D - Delete File from Subroutine 
Library List 


Q - Return to Main Menu 


- Execute Single GAPP Instruction 
- Execute Program to Breakpoint 






- Execute to End of Program 
- Edit Program “pr Editor 
{Program F ite} 






- Upload Data from GAPP RAM 
a - Upload Data from GAPP Register 

- Load Data from File 

- Edit Data File “Dp” Editor 





(Data File} 


- Return to Main Menu 





ee LY tad 
Register Select 


C - Display C Register 

M - Display CM Register 
N - Display NS Register 
E - Display EW Register 
A.- Display Ail Registers 





Ud ahd mw" ON” “Ee cr) we 
, e ‘ é A 


Display Data Sub-menu 


G - Execute Single GAPP Instruction 
B - Execute Program to Breakpoint 
i F - Execute to End of Program 
P - Edit Program sapere 
: P Editor 


H,J,K,L - Cursor Movement within {Program File) 


Data Display 
= PD - Download Data to GAPP Array 





S - Store Data to File (Specify 
Filename) 


C - Change Number Base (Hex or 
ae Decimat} “Q" 


Q - Return to Debug Menu 





GAPSYS Interactive Menu Structure 
le i a 
Copyright © 1985 by NCR Corporation, Dayton, Ohio, USA June, 1985 


iad 


fol 


a ee 
NCR45GDS1 





E/W WRAP AROUND 








& 


CONTROL REGISTER 







N/S WRAP AROUND 





ADDRESS REGISTER 






GAPP™ PROCESSOR 
ARRAY 


(2 GAPP CHIPS) 


GLOBAL OUTPUT 









READ DATA 


IBM PC I/O DATA BUS 





CORNER TURN LINE BUFFER 


(2 NCR45CT6 CHIPS) 
WRITE DATA REGISTER a 


BLOCK DIAGRAM. GAPP PC DEVELOPMENT SYSTEM 





Ci-aL3 NCR Microelectronics Division 2001 Danfield Ct. Fort Collins, Colorado 80525-2998 
Telex: 045-4505 NCRMiCRO FTCN Phone: 303/226-9500 303/223-5100 


GAPP, GAL, and GAPSYS are Trademarks of NCR Corporation 








NCR45GS2 
INSERT 


s USING THE GAPP SIMULATOR/ASSEMBLER 


% gapsim 


Copyright 1985 NCR Corp. Dayton, Ohio, USA. 
rights reserved 


GAPP Simulator version 1.0 System :UNIX 
Array size: 8 by 8 


. load demo.pic 0 
. rgrid 0 


COOK VUcton 
oocorocos 
OWE NON- 





To enter the GAPP Simulator/Assembler, the user types 
the command ‘‘gapsim’’. Upon entering the Simulator a 
message is typed to inform the user of the current array 
size {in this case the default size 8 by 8 is returned). A 
period ‘’.” is the command tine prompt within the Simu- 
lator. In the above figure the user loads data from a file 
named ‘“‘demo.pic’” and stores it in GAPP RAM frame 0 
(RAM focations 0-7}. The user then displays the contents 
of RAM frame 0 using the “rgrid’’ command. 


pot 
Mor wo w Une 
at Go ht te Oo Uo se 


OMmmanaewoe 
OCOMmacmmaooc 
OCOeemowoce 
OOoocoocoo 
Ooooeocoos 


0 
8 
0 
0 
8 
0 
8 
0 





Executing a program “‘thresh.do” on input data stored in 
GAPP RAM frame 5. When the program has completed 
execution, the user inspects the contents of data registers 
within each processor element in the GAPP array using 
the “‘pgrid’’ command. Here we see that certain proces- 
sor elements have their ‘‘C’’ registers set to ‘’1’° while 
other processor elements’ ‘’C’’ registers contain ‘‘0’’. 
All other registers (“CM, “NS”, and “’EW’’) contain logic 
value O in all processor elements. 


Copyright © 1985 by NCR Corporation, Dayton, Ohio, USA 


c:=0 

ew: = Tram 

ew: =w ns: = ram 
ram: = sm c: = cy 
ew: = ram 

ew: = w ons: = ram 
ram: = sm c: = cy 
ew: = ram 
ew: = w ns: = ram 
ram: =sm ¢:= cy 
@w: = ram 

ew: =w ns: = ram 
ram: = sm c:=cy 
ew: = ram 

Q@w: = w ns: = fam 
ram: =sm c:=cy 
ram: =c c:=0 
@w:=ram 

ew: =e ns:=ram 
ram: = sm ¢:=cy 
ew: = ram 

ew: =e ns: = ram 
ram: =sm c:=cy 
ew: = ram 


aint 


—_ 


0 
0 
0 
8 
1 
1 
9 
2 
2 
a 
3 
3 
b 
4 
4 
Cc 
d 
8 
8 
0 
9 
9 
1 
a 





An example of an assembly program ("horiz_edge.do”) 
run on the GAPP simulator-written entirely in GAPP 
mnemonics. 


July 1985 





NCR45GS2 


. do horiz_edge 
. grid 3 
fd 


7 
fd 


. load demo. pic 0 
. Tgrid 0 


fe 


2 
0 
4 
28 
11 
10 
2 
1 


oooocroow 
ON 19 Oh? 





Displaying the results of ‘“horiz_edge.do’’. The output oe 


data is stored in GAPP RAM frame 3 (RAM locations Executing “horiz_edge.do” using the ‘do’ command. 
24-31). Input data from file ‘‘demo.pic”’ is stored in GAPP RAM 
frame 0. 


coocoocoeoco 
oooocooo 
ooocooooe°o 


. xsize 10 
. ysize 10 ares 
. xsize 

Array size: 10 by 10. 
. rgrid 6 

0 

ff 

0 


0 
ff 
0 
ff 
0 
0 
0 


TOOSDOCODOCCO 
SPEOCSDSSDO0S 
DoomRoRooo 
2eonccocecoc0e 
oco0c000000 
SCODGCCOD000 
SCOMSSDGCODOCOSO 


“< 
a) 





Changing the GAPP processor array size from 8 by 8 to _ 
10 by 10 using the ‘’xsize’’ and “ysize’’ commands. The 

user exits the GAPP Simulator/Assembler using the 

“bye’’ command. 


Ci-43 NCR Microelectronics Division 2001 Danfield Ct. Fort Collins, Cotorado 80525-2998 
Telex: 045-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 





CAPPIX Il 


COMPUPIX ARITHMETIC PARALLEL PROCESSOR 











a The CompuPix CAPPIX fl is an IBM-PC BUS Compatible Processor Board with up to 288 CPU's, 
36864 x 1 BIT RAM. 1152 Single Bit Latches, and 8K Bytes of Data RAM. 


FEATURES: ARCHITECTURE: 
— Sings Princes Circuit Boara The CAPPIX Il is memory mapped. The 
7 — 144 (expandable to 288) Processors Data RAM is mapped in 4K Byte seg- 
ments and can be located to meet the 
user requirements. The CAPPIX proces- 
sor 1C’s each house a CMOS array of 72 
— Interfaces with IBM-PC and IBM-PC 
Compatinie Systems processors. Each processor is composed 
7 of abitserial ALU, 128x 1 Bit RAM, and 4 
— Operates on UNIX* Compatible and 
MS-DOS? Operating Systems single Bit latches. Three latches hold 
inputs to the ALU and the fourth latch 
allows !/O tnrough the cell without inter- 
rupting the ALU. I/O operations are over- 
— 9C Day Warranty lapped with computation. 


— 4! (expandat!< to BK) Bytes of 
Daia RAN’ 


— 36£64 x1 BIT RAR 


— Data RAM Mapped into main memory 


UNIX® is a trade and service mark of Be!! Laboratories 
MS-DOS 's a registered trade and service mark of Microsoft Corporation 


<anhattan Skyline Limited 

a, 4 ; 3 

{anhattan Skyline Limtted 
Manhattan House Bridge Road Maidenhead Berkshire SL6 8DB 

= Telephone: Maidenhead (0628) 75851 Telex: 847898 MANSKY Facsimile: (0628) 782812 





= 
1 ow 





GENERAL DESCRIPTION 





THE COMPUPIX CAPPIX Il PARALLEL PROCESSOR IS A PROGRAMMABLE GENERAL 
PROCESSING BOARD, CONTAINING ITS OWN ON-BOARD CONTROLLER AND UP TO 
8K BYTES OF SINGLE PORT SRAM MEMORY. BY USING SINGLE PORT RAM's, DATA 
CAN BE: 1) WRITTEN INTO BY THE HOST COMPUTER, 2) READ INTO THE HOST 
COMPUTER, 3) WRITTEN INTO BY THE CAPPIX PROCESSOR, OR 4) READ INTO THE 
CAPPIX PROCESSOR. 


THE CAPPIX il BOARD IS EASILY PROGRAMMED, AND IS SUITABLE FOR PATTERN 
RECOGNITION, PARALLEL PROCESSING, AND IMAGE PROCESSING. 


SPECIFICATIONS 
DIMENSIONS: 13.2 inch (33.5 cm) x 4.2 inch (10.7 cm) printed circuit board 
DATA RAM: 4K (Expandable to 8K) Words 
CYCLE TIME 45 Nanoseconds 
SHIPPING WEIGHT: = 2.5 Ibs (1.14 kg) including board and documentation 
POWER: Power supply 5 VDOC + 5% 

Current 2.5 amp typical, 3.5 amp maximum 
TEMPERATURE: Operating: 0 degrees C to 55 degrees C 


Shipping: -55 degrees C to +55 degrees C 


INSTALLATION 


EACH CAPPIX Ii BOARD |S SHIPPED WITH A DETAILED INSTALLATION AND 
INSTRUCTION MANUAL. INSTALLATION NORMALLY REQUIRES LESS THAN 10 
MINUTES. 





WARRANTY 


ALL CAPPIX li BOARDS ARE WARRANTED AGAINST DEFECTS IN MATERIALS OR 
WORKMANSHIP FOR 90 DAYS AFTER SHIPMENT DATE. DEFECTIVE BOARDS 
COVERED BY THIS WARRANTY SHALL BE RETURNED TO COMPUPIX PREPAID AFTER 
CONTACTING COMPUPIX FOR A RETURN MATERIALS AUTHORIZATION NUMBER. 
COMPUPIX PROVIDES 48 HOURS TURN AROUND OF REPLACED OR REPAIRED 
BOARDS TO THE PURCHASER. 





ACanhattan Skyline Limited 


Manhaftan House Bridge Road Maidenhead Berkshire SL6 8DB 
Telephone: Maidenhead (0628) 75851 Telex: 847898 MANSKY Facsimile: (0628) 782812 








NCR45CT6 
GAPP” APPLICATION NOTE NO. 1 


INPUT/OUTPUT OF REAL TIME VIDEO DATA 
USING THE NCR45CT6 


2 INTRODUCTION 


This application note describes how the NCR45CT6 is 
used to reformat real time video data for an array of 
NCR45CG72 GAPP “(Geometric Arithmetic Parallel Pro- 
cessor) devices. The NCR45CT6 devices are cascaded to 
create the “Corner Turn Line Buffer’. 


The Corner Turn Line Buffer performs two functions: 
loading data into the GAPP array, and unloading results 
from the GAPP array. 


Loading Data into the GAPP Array: 


Video data is received from an anatog-to digital con- 
verter one pixel at a time; usually, each pixel is re- 
presented by a group of bits (for illustrative pur- 
poses, these “groups” of bits will be referred to as 
“columns” of bits}. Once an entire line of pixels ts 
received, the Corner Turn Line Buffer holds a col- 
umn of bits for every pixel in the line. Now, the col- 
umns are shifted into the GAPP array one row ata 
time. For example, the most-significant bit of each 
column is shifted, followed by the second-most-sig- 
nificant bit, etc., until all of the bits have been shift- 
ed into the GAPP array. 


Unloading Data from the GAPP Array: 


Untoading data from the GAPP array is simply a 
matter of reconstructing the ‘‘cotumns” of bits for 


each pixel in a line. The resultant rows for each video 
line are shifted into the Corner Turn Line Buffer 
where each bit is placed in the appropriate pixel col- 
umn. Finally, the columns are shifted out of the line 
buffer one column (or pixels) at a time into a digital- 
to-analog converter, where a signal is produced which 
may be sent directly to a video monitor. 


» IMPLEMENTATION 


The following example is provided to illustrate an actual 
implementation of the Corner Turn Line Buffer. 


For this example, a window, or partial frame, of video 
data is defined as 48 tines; each tine contains 48 pixeis 
(i.e. a window is an array of 48-by-48 pixels). Each pixel 
is represented by six bits of data. 


Since each NCR45CT6 device contains an array of six- 
by-twelve processing elements, only one row of NCR- 
45CT6 devices is required to hold a line of video data 
(see Figure 1}. Two rows of NCR45CT6 devices may be 
used to hold a line of twelve bit per pixel data, or a line 
of eight bit per pixel data with an additional four bit 
planes available for graphic or text overlay. Four rows of 
NCR45CT6 devices may be used as a twenty-four bit per 
pixel line buffer (3 colors, 8 bits per color}. 





VIDEO 
iN 
| 
W W oan W 
= 6-BIT 
NCR4SCT6 Nj-jS NCR4SCT6 Ni7—jS NCR45CT6 NIZ|S NCR45CT6 N A-TO-D 
E € — E CONVERTER 


CORNER TURN LINE BUFFER 


Figure 1. Four NCR45CT6 devices connected as a 48 pixel long corner turn line buffer; each pixel is represented by 6 bits. 
The pixel data is shifted through the NS registers from the north to the south. 


Table |. Instruction Sequence Used By the Line Buffer to Load Data From A-to-D and Unioad to D-to-A 






Line Buffer 
instruction Queue 





ie) 


wo 


ame 
— 
Cas 


Description of Actions 


| oONS:=N | Shift in 1st pixel of line from analog-to-digital converter. 
Shift in 2nd pixel of line from analog-to-digital converter. 
Shift in 3rd pixel of line from analog-to-digital converter. 











hift in 47th pixel of line from analog-to digital converter. 


hift in last pixel of tine from analog-to-digital converter. 















Copyright © 1985 by NCR Corporation, Dayton, Ohio, USA. All rights reserved. Printed in USA. 


August 1985 


GAPP APPLICATION NOTE NO. 1 


The flow of the data from the analog Video jn signal to Since each NCR45CT6 device contains a six-by- 
the analog Video Out signal is described below: twelve array of processors (six rows and twelve col- 
umns), the number of devices required to implement 
the six bit deep line buffer is one-twelfth the num- 
ber of pixeis per tine. 


1. The analog Video In signal is digitized to create a six- 
bit digital representation for each pixel. 


2. The video data is shifted into the NCR45CT6 devices 
by executing the sequence of instructions in Table 1. 
Each NS: = N instruction shifts in one pixel from 
the analog to digital converter. 


Figure 2 illustrates the two dimensional 48-by-48 
GAPP array with one processor elernent {PE} per 
pixel. The six bits per pixel are stored in six RAM 
locations within each PE. 


NCR. 
45CG72 


6x12 
GAPP 





Pelt) tint Tttttt ttt 


CMN 


ES 








CMN 


NCR- 
45CG72 
6x 12 
GAPP 


NCR.- 
45CG72 
6x 12 
GAPP 





CMS 


HT 


CMS 





Pat em 





PROCESSOR ARRAY 
(N to S and E toW 
connections not shown) 


CMN 


NCR. 
45CG72 
6x12 





Hi ee ens 
48 





aa 
1 


5.0V 


VIDEO 
ae Te 


| || 









| 





6-BIT w Ww Ww Ww 

D-TO-A NCR45CT6 N iS NCR45CT6E WN S NCR45CT6 WN S NCR4&5CT6 N 

CONVERTER E E E CONVERTER 
12 12 12 12 


Figure 2. Corner Turn Line Buffer interface to a 48X48 array of PE’s. {32 GAPP ICs, each containing a 6X12 array of PE‘s). 
image frames output from the CMN bus of the GAPP array are connected to the east inputs of the NCR45CT6 
devices in the video line buffer for conversion to a video line output of 6 bits/pixel. 


*R: 10KQ pull-up resistor on each CMS input prevents the inputs from floating when the west outputs of the corner 
turn line buffer devices are at Hi-Z. 


NCR reserves the right to make any changes or discontinue altogether without notice with respect to any hardware or software product or the 
technical content herein. 


2 





| a 


ro; 


Instruction Line Buffer Processor Array 
Cycle Instruction Queue |Instruction Queue 


| 6 
i 
aa 


During the horizontal retrace interval, the video line 
is shifted into the processing array by executing the 
sequence of instructions listed in Table 2. The first 
instruction does a “corner turning’ operation by 
shifting the data from the NS register into the EW 
register of the NCR45CT6 devices. Simultaneously, 
the first bit of each pixel! in the GAPP array is fetch- 
ed from RAM with the CM: = RAM(Q) instruction. 
The next three instructions are repeated six times: 
once for each bit in the pixel, As each of the six bits 
per pixel are clocked out of the video fine buffer 
{using an EW: = € instruction}, they are read into 
the CM register of the bottom row of PEs of the 
GAPP processor array (using a CM: = CMS instruc- 
tion). This CM: = CMS instruction also causes the 
corresponding bit of each row in the processing 
array to be shifted up one row. Next, the data is 


St 


GAPP APPLICATION NOTE NO. 1 


saved in the appropriate RAM location while the 
corner turn buffer performs a NOP instruction. 
Finally, the next bit for each row in the GAPP 
array is fetched from RAM to prepare for the next 
shift, while the corner turn buffer executes another 
NOP instruction. 


After the entire video window has been joaded into 
the processor array, computations may begin. For 
some ajgorithms it is necessary to load two or more 
sequential frames into the GAPP array (each frame 
is stored in different RAM locations). 


The result of a computation consists of a video win- 
dow stored in RAM. The result is output to the tine 
buffer using the sequence of instructions listed in 
Table 2: If the computed result consists of 6 bits per 
pixel and resides in RAM locations 0 through 5, it 


Table 2. instruction Sequence Used to Load (Unload) Data To (From) GAPP Array. 


E 

EW: 
EW: 
EW: 


M (5) 








Description of Actions 
CM: = RAM (0) [Move new-line data into EW of Line Buffer and toad CM of array 
with LSB of previous lines. 
CM: = CMS Shift LSB of new line into south end of array while shifting previous 
line LSB’s up 1 row 
M: = 


RAM {0):=CM {Store LSB into RAM location 0 


CM: = RAM (1) |Load 2nd LSB of previous tines into CM of array to prepare for next 
shift. 
EW: =E CM: = CMS Shift 2nd LSB of new line into south end of array while shifting 2nd 


RAM (1}: = CM [Store 2nd LSB into RAM location 1. 


CM: = RAM (2) |Load 3rd LSB of previous lines into CM of array to prepare for next 
shift 
W:=E CM: = CMS Shift 3rd LSB of new line into south end of array while shifting 3rd 
LSB’s of previous lines up 1 row 
M 


LSB’s of previous lines up 1 row, 


RAM (2}:= CM [Store 3rd LSB into RAM location 1. 
CM: = RAM (3) |Load 3rd MSB of previous lines into CM of array to prepare for next 
shift. 
=E CM: = CMS Shift 3rd MSB of new line into south end of array while shifting 3rd 
MSB’s of previous fines up 1 row. 
zz RAM (3): =CM [Store 3rd MSB into RAM location 1. 
ee CM: = RAM (4) |Load 2nd MSB of previous lines into CM of array to prepare for next 
shift. 
CM: = CMS Shift 2nd MSB of new line into south end of array while shifting 2nd 

MSB’'s of previous lines up t row. 

= CM 


| NOP RAM(4}: = Store 2nd MSB into RAM location 1. 


Ne = | CM: = RAM (5) |Load MSB or previous lines into CM of array to prepare for next 
shift. 
CM: = CMS Shift MSB of new line into south end of array while shifting MSB’s 
of previous lines up 1 row, 
N RA CM 


Store MSB into RAM location 1 and place the output data which has 
been shifted into the Line Buffer’s EW from the top of the proces- 
sing array into the NS. 














GAPP APPLICATION NOTE NO. 1 


may be output simultaneously with the loading of a 
new video window. With the connections as shown 
in Figure 2, the output on the CMN bus from the 
north edge of the GAPP array is wrapped around 
and shifted into the east input of the line buffer. 


6. The output of each video tine into the line buffer 
using the instruction sequence in Table 2 spreads 
the 6 bits per pixel into 6 adjacent EW registers in 
the line buffer. The fast instruction in Table 2 per- 
forms the corner turning operation by placing the 
contents of the EW registers into the NS registers. 
Then during the input of the next video line from 
the analog-to-digital converter, the resuitant video 
line (now in the NS registers) is shifted out to the 
south (into the digital-to-analog converter) by ex- 
ecuting the sequence of instructions listed in Table 
1. The 6 bits per pixel are output directly to a dig- 
ital-to-anaolog converter which may provide a video 
signal directly to a display monitor, 


=» SAMPLE APPLICATION 


The following example is provided to demonstrate how 
this corner turning scheme is implemented for a specific 
application. 


For this example, Frame A is originally loaded into 


RAM locations 00 through 05 of the GAPP array. Dur- 
ing the vertical retrace period Frame A is processed and 
the result {Frame A‘) replaces it in RAM locations 00 
through 05. Then while Frame B is being loaded into the 
south edge of the GAPP array from the Corner Turn 
Line Buffer, Frame A’ is being unloaded from the north 
edge of the array into the Corner Turn Line Buffer. 


In Figure 3, line 2 of Frame 8 has just been loaded into 
the bottom row of PEs in the GAPP array and line 2 of 
Frame A‘ has just been output from the top row of the 
PEs in the GAPP array into the EW registers of the line 
buffer. Next, line 2 of Frame A’ is transferred to the NS 
registers of the line buffer with the operation NS: = EW. 
Now as line 3 of Frame B is shifted into the line buffer, 
the line 2 of Frame A’ is simultaneously shifted out 
from the line buffer into the digital-to-analog converter. 


For many applications requiring multiple frames, a more 
sophisticated scheme is used. Pipelining of frames may 
be required to obtain desired throughput. Another 
scheme might utilize special features of a hardware 
controller or the GAPP Language compiler to allow 
execution of microcode itn the GAPP array while data Is 
being loaded into the array {i.e, during a portion of the 
horizontal retrace period as well as the vertical retrace 
period), 


Outout Line 3 of Frame A’ 






CORNER TURN 
LINE BUFFER 


Output 
Line 2 of 
Frame A’ 
{to D/A converter) 


Figure 3. Illustration of one type of data flow in the GAPP system. Data is input to the NS registers from the right edge of 
the corner turn line buffer, transferred to the EW registers, shifted into the GAPP processor array on the CM bus, 
and downloaded into RAM. Result data from RAM is uploaded into the CM register, shifted out of the array and 
into the EW registers of the corner turn line buffer; finally, it is transferred to the NS registers and shifted out the 


left end of the corner turn line buffer. 








EW REGISTER PLANE 


NS REGISTER PLANE 


pen RAM O} Frame A’ 
M02 
Frame B 


PROCESSOR 
ARRAY 










yin 


| EW-=NS 









Input Line 3 
of Frame B 
(from A/D converter) 


Cc NCR Microelectronics Division 2001 Danfield Ct. Fort Collins, Colorado 80525-2998 
Telex: 045-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 





4 


Seated 








GAPP APPLICATION NOTE NO. 2 


GAPP MEMORY EXPANSION 


Although the Geometric Arithmetic Parallel Processor, 
or GAPP chip contains 128 bits of RAM per processing 
element it is occasionally desirable to expand the mem- 
ory using external static RAM to provide capability of 
storing additiona! image data or to perform numerical 
computations that require more memory space. 


Figure A shows one possible memory expansion tech- 
nique which utilizes six n X 1 bit RAM chips per GAPP 
device in the system. Figure A shows one RAM chip 
connected to each of the six CMN lines on a GAPP de- 
vice, This configuration is preferred when additional 
memory requirements are on the order of 2K or more 
bits per processor element. For example, using six 64K 
by 1 RAM chips would provide 5461 bits of RAM per 
processing element {64K divided by 12 is 5461 leaving a 
total of 4 bits of RAM unused). 


Figure B depicts a memory expansion technique utilizing 
asingle 8 bit wide RAM. This configuration is used when 
less than 2K of RAM per processing element is required 
because it minimizes the amount of hardware required. 
{i.e. it only requires one RAM chip). For example, 682 
bits of RAM per processor can be provided by a single 
8K X 8 RAM chip. 


In both configurations the GAPP uses the CM bus so 
that data transfer to RAM has minimum impact on atgo- 
rithm execution time. It takes 12 clock cycles to shift a 
plane of data across the CM plane and one cycle to trans- 
fer the single bit plane of data to RAM inside the GAPP. 
Thus the time to transfer 8 bits of data between GAPP 
RAM and external RAM is 96 data shift instructions plus 
8 GAPP RAM data operations. Because of the GAPP 
chip’s unique architecture, processor operations within 
the GAPP array need only be interrupted during the 8 
GAPP RAM operations. Thus, for every RAM operation 
there are 12 CM shift operations that can execute con- 
currently with program execution. !f the user’s applica- 
tion can be processed in a pipelined fashion, the loading 
of new data can take place concurrently with program 
execution on previously loaded data. 


Table 1 provides a program listing that writes to the ex- 
ternal memory. Table 2 provides a program listing that 
reads from the external memory (refer to Figures C and 
D). 


TMGAPP is a trademark of NCR Corporation. 


» TABLE 1. PROGRAM LISTING FOR WRITING DATA TO EXTERNAL RAM. 


GAPP instructions supplied from the instruction queue of the GAPP controller 


Buffer 

Instruction GAPP _ Tristate 

Number Instruction R/W Control 

1 m CM:=RAM 1 4 

2 CM:=CMS 1 1 

3 CM:=CMS 0 1 

. 4 CM:=CMS 0 1 

Istbit (og CM:=CMS 0 1 

plane. \ gs CM:=CMS 0 1 

7 CM:=CMS 0 1 

8 CM:=CMS 0 1 

9 CM:=CMS 0 1 

10 CM:=CMS 0 1 

1 CM:=CMS 0 1 

12 CM:=CMS 0 1 

13 CM:=CMS 0 1 

14 m+1 CM:=RAM 0 1 

‘ 15 CM:=CMS 1 1 

éndbit} 46 CM:=CMS 0 1 
plane ‘ 
8 
e 

26 CM:=CMS 0 1 


Copyright © 1985 by NCR Corporation, Dayton, Ohio, USA. All rights reserved, Printed in IISA. 


External 
Memory 
Address Comments 

- Begin WRITING to external 
RAM by loading GAPP data 
into the CM plane. 

n There is a one cycle pipeline 
delay before write to RAM 
begins. 

n+ 1 

n+2 

n+3 

n+4 

n+5 

n+6 

n+7 

n+8 

n+9 

n+ 10 

n+11 Finish writing bit plane 

to RAM. 

= Start next bit plane. 

n+12 
n+ 22 


June 1985 


GAPP APPLICATION NOTE NO. 2 


® TABLE 1. CONTINUED 


Instruction 
Number 


27 


104 
105 


m+2 


GAPP 
Instruction 


CM:=RAM 


CM:=CMS 
CM:=CMS 


CM:=CMS 
NOP 


RAV 
0 


1 
0 


0 
0 


Buffer 
Tristate 
Control 


1 


1 
1 


External 
Memory 
Address 


n+ 23 


n+ 24 


n+94 
n+95 


n represents the base address of the external RAM that is written to, 
m represents the base address within GAPP RAM 


Comments 


Finish writing bit plane 
to RAM, 
Start next bit plane. 


Finish writing 8th bit plane. 


# TABLE 2. PROGRAM LISTING FOR READING DATA FROM EXTERNAL RAM. 


GAPP instructions supplied from the instruction queue of the GAPP controller 





Buffer External 
Instruction GAPP _ __ Tristate Memory 
Number Instruction RW — Control Address Comments 
1 CM:=CMS J 1 - Begin READING from external 
2 CM:=CMS 1 0 n RAM. There is a one cycle pipeline 
3 CM:=CMS 1 0 n+ 1 deiay before read from RAM begins. 
4 CM:=CMS 1 0 n+2 
3) CM:=CMS 1 0 n+3 
. 6 CM:=CMS 1 0 n+4 
Wtbit 7 4 CM:=CMS 1 0 n+5 
plane.) 3g CM:=CMS 1 0 n+6 
9 CM:=CMS 1 0 nt+7 
10 CM:=CMS 1 0 n+8 
11 CM:=CMS 1 0 n+9 
12 CM:=CMS 1 0 n+10 
13 m RAM:=CMS 1 0 n+11 Finish reading bit plane 
- from RAM and store in GAPP RAM 
14 CM:=CMS 1 1 = Start next bit plane. 
15 CM:=CMS 0 n+ 12 
2nd bit 16 CM:=CMS 1 0 n+13 
plane bad 
e 
 ] 
26 m+t RAM:=CM 4 0 n+ 23 Finish reading bit plane 
from RAM and store in GAPP RAM 
27 CM:=CMS 1 1 Start next bit plane. 
28 CM:=CMS 1 0 n+ 24 
29 CM:=CMS 1 0 n+25 
i] 
* 
° 
103 CM:=CMS 1 0 n+94 
104 m+? RAM:=CM 1 0 n+95 Finish reading 8th bit plane. 
105 NOP 1 1 — 


n represents the base address of the external RAM that is read from. 
m represents the base address within GAPP RAM 
NCR reserves the right to make any changes or discontinue altogether without notice with respect to any hardware or softwere product or the 
technical content herein. 





meciond feneond ame | 


GAPP APPLICATION NOTE NO. 2 


TRISTATE — 
CONTROL RAW ADDRESS 


CMN 







GAPP 
DEVICE 


CMS 


oe TRISTATE 
TRISTATE g 7418241 BUFFER 


BUFFER iN 






1 CYCLE DELAY IN EXECUTION PER PLANE 
12 CYCLES INTERLEAVED PER PLANE 


TRISTATE _ 
CONTROL RAV ADDRESS 


Figure A. Memory expansion of a GAPP based system using six nx bit 
external RAMs. 


ee lla 
3 


GAPP APPLICATION NOTE NO. 2 


TRISTATE 


CONTROL FW ADDRESS 


7418241 






7418241 
TRISTATE 
BUFFER /N TRISTATE 
Le. BUFFER 
CMN 
GAPP 
DEVICE gis, 
CMS 


ro 


A~A~|! DW 


1 CYCLE DELAY IN EXECUTION PER PLANE 
12 CYCLES INTERLEAVED PER PLANE 


TRISTATE __ 
CONTROL R/W ADDRESS 


Figure B. Memory expansion of GAPP based system using a single nx8 bit 
external RAM. 





— v4 





GAPP APPLICATION NOTE NO. 2 


EXTERNAL 
CONTENTS OF GAPP CM REGISTERS CONTENTS OF EXTERNAL RAM RAM 
ADDRESS 
1. m CM:=RAM: 
_ DATA IN 


Cc 


MN 


2.CM:=CMS; 

7 CMN DATA IN 
ee ee is ne 
nogoge v2 
ol vs 
3. CM:=CMS:;: 

~ DATA IN 


cs 
> 
= 
aD 
> 
= 
nN 
x 
> 
= 
Ww 
Ps) 
> 
= 
P= 
ms) 
> 
= 
ol 
P) 
> 
= 
mo 


Figure C. Diagram of data transfer from location m in GAPP RAM into external RAM via the CM registers. See program 
listing in Table 1, Shown above are the results of the first 3 GAPP instructions referred to in Table 1. On the 
left, each square represents the CM register for each processing element in the GAPP processor array. On the 
right, each column represents an nx1 bit external RAM. Each square within a column represents a distinct 
RAM location. The column to the far right lists addresses for each location with the hightighted address repre- 
senting the external RAM location being accessed in the current instruction cycle, 


GAPP APPLICATION NOTE NO. 2 


EXTERNAL 


CONTENTS OF GAPP CM REGISTERS CONTENTS OF EXTERNAL RAM RAM 
ADDRESS 


1. CM:=CMS; 


ge 


es aie ee 

CMS DATA OUT 
2.CM:=CMS; 

pol, eh ee fee) ee 

eel [sap ey Ta ee. ae 
oe ee es Fa I PP 
eeoreoue Mey (sal Pedy. ke) see: GZ) ee 
CMS DATA OUT 
3.CM:=CMS; 

coe [eel lek) (eee ee, ey 

Fe Ean a 

oyolelelaty) (ee feb fe) yey pe 
Fess eS ( 
CMS DATA OUT 


RAM1 RAM 2 RAM 3 RAM 4 RAM 5 RAM 6 


Figure D. Diagram of data transfer from external RAM into CM registers of a GAPP device. After data is shifted into the 
CM registers it is then transferred to GAPP RAM. See program listing in Table 2. Shown above are the results of 
the first 3 GAPP instructions referred to in Table 2. On the Jeft, each square represents the CM register for each 
processing element in the GAPP processor array. On the right, each column represents an nx1 bit external RAM. 
Each square within a column represents a distinct RAM location. The column to the far right lists addresses for 
each location with the highlighted address representing the external RAM location being accessed in the current 
instruction cycle. After an entire bit plane is loaded into CM from external RAM it is then stored in GAPP RAM. 


NCR Microelectronics Division 2001 Danfield Ct. fort Collins, Colorado 80525-2998 
Telex: 45-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 


=a 


—— 


eae, | 


_ 


ny 





NCR45CM16 





CMOS 16 X 16 BIT 
SINGLE PORT MULTIPLIER/ACCUMULATOR 


# GENERAL DESCRIPTION 


The NCR45CM16 is a 24 pin CMOS multiplier/accumulator for use with 16-bit microprocessor systems. All input and output 
data are transferred through a single 16-bit bidirectional data bus in signed two’s complement format. This device is TTL/ 
CMOS compatible and requires no clock due to its tota! static (asynchronous) operation. The device may be attached to the 
system bus in the same way as a 16-bit wide static RAM. A single 16 x 16 multiply and read 32-bit result requires 5 cycles 
{write X, write Y, multiply, read high-order result, read low-order result). Pipelined multiply/accumulate operations require 
only 2 cycles each. 


«s FEATURES 
@ 24 Pin Package @ Low Power CMOS 
— 300 mil Ceramic “Skinny DIP” — 100uW Standby (max) 
— 600 mil Plastic DIP — 10mA Operating (max) 
® 40 bit Accumulator e@ Single 5 Volt + 10% Supply 
— Add Product to Accumulator ® Fully Static Operation — No Clock Required 
-~ Subtract Product from Accumulator ® 3-state Bus Compatible Outputs 


® Cycle Time 190 ns (typ) 












s PIN CONFIGURATION ® FUNCTIONAL BLOCK DIAGRAM 
cs 1 cs 1 Vpo 
a3} 2 A3(}.2 A2 
WE 3 WE 3 Al 
DB 4 DB ‘4 AO z X-REGISTER 
po 5 boc 46 20-7 Db? 
Dio C16 b10 LY 6 DE 
p11 7 DI 7 D5 . 
O12 B 012 8 D4 is] 
p13} 9 bi3 (+8 03 + 
p14 [ho p14 ho 92 Do- Dis} 2 MULTIPLIER 
D15 D15 1 D1 5 ARRAY 
GND GND Ch DO fe) 
-~ 
NCR45CM16-P NCR45CM16-C = 
Plastic DIP Ceramic DIP z 
= PIN NAMES 






[00-015 _| Dota Inputs Output | 
S| Chinselect 
We [write Enable | 


Vpp 5V + 10% 
Supply Voltage 


* Specifications are subject to 
change without notice. 






"ACCUMULATOR 


| MULTIPLEXER 








Copyright © 1984 by NCR Corporation, Dayton, Ohio, U.S.A. All Rights Reserved. Printed in U.S.A. 


ee 


NCR45CM16 


s ABSOLUTE MAXIMUM RATINGS 


Supply Voltage, Vpp..-- 6 ee ee ees +7V Stresses above “absolute maximum ratings’ may result 
Voltage on any pin with respect in damage to the device. Functional operation of devices 
toground............0.0ees ~0.3 to Vpp + 0.3V at the “absolute maximum ratings’ or above the recom- 

mended operation conditions stipulated elsewhere in this 
Storage temperature... .......--.-5 —65°C to 150°C specification is not implied. 


« RECOMMENDED OPERATING CONDITIONS 


Parameter 


Supply voltage 


tnput high level voltage 
Input low Jevel voltage 
Operating ambient temperature 





® STATIC ELECTRICAL CHARACTERISTICS 
OVER RECOMMENDED OPERATING CONDITIONS 


Input leakage current Vin=0V to Vpp max 

Output leakage current V920.4 to Vpp max 
CS+1 

Output high voltage current loH=400pA 

Output low voltage lop=2.1mA 

Supply current — Active Outputs Open 

Supply current — Standby 





*Typical limits are Vpp = 5.0V, Ta = 25°C; typical parameters are not guaranteed 


s CAPACITANCE Ta = 25°C, f= 1 MHz 


Input capacitance All pins except pin 


Input/Output capacitance under test are tied to 
ground 





a Pe aE 
2NCR reserves the right to make any changes or discontinue altogether without notice with respect to any hardware or software product or the 
technical content herein. 


MLO 


—a_ PO rE lee EO ell el lll elle CC. 





NCR45CM16 


READ OPERATIONS (WE = 1) 


pas | az | ay | to OPERATION 


fx | o | o fo Read bits 0 through 15 of result from accumulator 
rakee cea Read bits 16 through 31 of result from accumulator 
re o64| 08 (a i) ae | Read bits 32 through 47 of result” from accumulator 










DIVIDE BY 2 AND READ (WE = 1) 


OPERATION 


‘Read bits 1 through 16 of result from accumulator 


Read bits 17 through 32 of result from accumulator 


Read bits 33 through 48 of result from accumulator” 






X = Don’t care 


*NOTE: Accumulator accumulates to 40 bits, Thus bits 0 - 39 are valid, while bits 40 - 48 are a sign extension of bit 39. 


WRITE OPERATIONS (WE = 0) 


pas | Aa ACCUMULATOR OPERATION | Ay MULTIPLIER OPERATION 





















Pwr Lt | co | Writenewcetatox 
Subtract X- Y from A Bee ee Weite-newideleteboieocand ¥ 





= Accumulator 
Data latched into X-register 


Data latched into Y-register 








NCR45CM16 


» EXAMPLE OPERATIONS 


1. Multiply two 16-bit numbers, read 32-bit result 
Instruction WE Operation 


0010 0 Clear A, Write X 

0001 0 Ciear A, Write Y 
0100 0 AddX-YtoA 

0000 1 Read low order result 
0001 1 Read high order result 


2. Multiply two 16-bit numbers and accumulate, repeat five times (five point digital filter}, read 40-bit result = X,Yy + 
X2Y2 + X3V3 + XaVq + X5 V5 


Instruction WE Operation 
0010 0 Clear A, Write X 
0001 0 Clear A, Write Yy 
0110 0 A= X1- 1, Write Xo 
1001 0 Write Y2 
0110 0 A=X1°V¥1+X%2° Vo, Write X3 
1001 0 Write V3 
0110 0 A=X1 + V1 +Xq° V¥o+Xaq-° Ya, Write Xq 
1001 0 Write Ya 
0110 0 A=Xy°V¥44Xo° V¥o+ Xq + V3 + Xa Va, Write X5 
1001 0 Write Ys 
0100 0 A=X- Vy + Xq-¥o+X3 Vat Xe Vat Ms V5 
0010 1 Read most significant bits (32-47} of result 
0001 1 Read bits 16-31 of result 
0000 1 Read bits 0-15 of result 


3. Half of sum of squares = % (By? + Bo?) 
Instruction WE Operation 


0011 0 Clear A, Write B, to Registers X and Y 

0111 0 A=B,2, Write Bz to Registers X and Y 

0100 0 A=B,2 + Bo? 

0101 1 Divide A by 2 and read most significant 16 bits 


4. Scale a series of numbers by a constant 
Instruction WE Operation 


0001 0 Clear A, Write Constant to Y 
1010 0 NOP A, Write X, 
0100 0 A=X,-:¥Y 

000% 1 Read high order result 
0000 1 Read low order result 
0010 0 Clear A, Write X2 
0100 0 A=Xo:Y 

0001 1 Read high order result 
0000 1 Read low order result 
0010 0 Clear A, Write X3 
0100 0 A=X3°Y 

0001 1 Read high order result 
0000 1 Read tow order result 


~" 


| 





NCR45CM16 


® ACCHARACTERISTICS vVop = 4.5 to 5.5V, Ta = 0 to 70°C, Vi, = 0.0V, Vin = 3.0V 
READ CYCLE 


Read Cycie Time 
Address Access Time 


Chip Select to Output Data Valid 
Write Enable Set Up Before Select 
Read Recovery Time 

Chip Deselect to Output High-Z 
Data Hold from Read Time 


READ CYCLE TIMING WAVEFORMS 
The read operation is performed with WE = high. The falling edge of C5 latches the address and initiates the read process. 


-——_— tac ————_—} 


" Kita, , 1,3S 
ltwes { } 


tco—*| f--tan-el 





ge) 
cae” taa 1 tonr} 


H I 
paTAout ——ttz___¢ XY fora vauio< XX — 


WRITE CYCLE 


PARAMETER 


Write Cycle Time 
Address Valid to End of Write 
Write Enable Set Up Before Select 


Write Recovery Time 

Write Pulse Width 

Data Set Up to Write Time 
Data Hold From Write Time 
Write Enable to Output Hi-Z 





WRITE CYCLE TIMING WAVEFORMS 
The write operation is performed with WE = low. The falling edge of CS latches the address and the rising edge of CS latches 
the data in. 


ADDRESS 


wa ot twee 
_ | SL eae 
CS { ! 
| 
| 
I 





1 lton 
to 


DATAWN | i_varavauio DC 


pe— wz —> 


} , 





NCR45CM16 
® AC TEST LOAD CIRCUIT 


OUTPUT UNDER 
TEST 


*Includes jig capacitance. 
Ali diodes 1N3064 or equivatent. 





CAUTION 


1. CMOS Devices are damaged by high energy electrostatic discharge. Devices must be stored in conductive foam or 
with all pins shunted. 


2. Remove power before insertion or removal of this device. 


Cis NCR Microelectronics Division 2001 Danfield Ct. Fort Coltins, Colorade 80525 
Telex: 045-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 





6 


- 


-- 


NICIR 


NCR Corporation 


Microelectronics - Fort Coilins 
2001 Dantield Court 
Fort Collins, Colorado 80525 - 2998 


ERRATA 


NCR45CM16 SINGLE PORT MULTIPLIER DATA SHEET 


The "WRITE OPERATIONS" table on page 3 of the data sheet implies 
that data can be written to the Y register while simultaneously 
adding or subtracting the previous XY product into the accumula- 
tor. This CANNOT be done in all cases. The following table 
replaces the WRITE OPERATIONS table at the bottom of page 3. Note 
that op-codes 0101 and 1101 are invalid. 


WRITE OPERATIONS (WE=0) 


A3 A2 A] Ag Operation A3 A2 Aj Ag Operation 


0 0 0 0 Clear Accumulator 0 0 0 Retain Accumulator 
Retain X and Y Retain X and Y (NOP) 


Clear Accumulator Retain Accumulator 
Write new data to Y Write new data to Y 


Clear Accumulator Retain Accumulator 
Write new data to X Write new data to X 


Clear Accumulator Retain Accumulator 
Write new data to Write new data to X and Y 


Add X Y to Accumulator Subtract X Y from Accun. 
Retain X and Y Retain X and Y 





0 1 0 21 Invalid Operation Ey Or Invalid Operation 


Add X Y to Accumulator Subtract X Y from Accun. 


Write new data to X Write new data to X 





011i Add X Y to Accumulator 121 411 Subtract X Y from Accum. 
Write new data to X and Y Write new data to X and Y 








APPLICATION NOTE M-1 
NCR45CM16 


MICROPROCESSOR MULTIPLICATION ACCELERATOR 


As many assembly language programmers can attest, per- 
forming multiplication operations with a microprocessor 
can take a great amount of time. The unaided micropro- 
cessor is especially slowed down by repeated multiply- 
accumulate operations that are common in process con- 
trot or digital signal processing applications. This reduced 
performance limits the maximum bandwidth signal that 
the general purpose microprocessor can handle, 


The alternatives, however, for improving the effective 
throughput of the processor are expensive. Previously 
the system designer could add a special purpose array 
processor board to his system, or redesign his system to 
use a@ special purpose DSP microprocessor. Both of 
these options require high expense or extensive engi- 
neering which may not be justified for many applica- 
tions. 


Another solution, that of adding an expensive three port 
multiplier chip with the associated latches and fogic re- 
quired to interface it to the system, can take up a large 
amount of system board space and consume an inordi- 
nate amount of power. 


On the other hand, a small, low power multiplier chip 
that could be interfaced to the system with little addi- 
tional circuitry would be an attractive solution to the 
throughput problem. NCR has developed a micropro- 
cessor bus compatible, 16 x 16 multiplier (NCR45CM 16) 
which is designed specifically as a microprocessor ‘‘mul- 
tiplication accelerator”. It is packaged in a small 24-pin 
DIP and typically consumes only 5mA while cycling 
through multiply-accumulate operations at a5 MHz pace. 
One important feature of the device is its simple system 
interface. The NCR45CM16 attaches to the micropro- 
cessor bus and appears to the system as a 200-ns, 16-bit 
wide static RAM. Figure 1 shows the size of the 
NCR45CM16 package next to a conventional three-port 
multiplier. The smal] size of the single port device will 
allow its incorporation into many existing microcom- 
puter boards. 





Figure 1. Comparison between the NCR45CM16 (below) 
and conventional three-port multiplier/accu- 
mulator chip (above) clearly shows that the 
bulky three-port does not size up for space 
limited microprocessor board designs, 


Copyright © 1985 by NCR Corporation, Dayton, Ohio, U.S.A. Al) Rights Reserved, Printed in U.S.A. 


NCR45CM16 


e USE IN A SYSTEM 


The multiplier chip is most easily used if it is mapped 
directly into the processor's memory space. This is be- 
cause the device has Chip enable (CE) and Write enable 
(WE) pins that perform the same functions as they 
would for a static RAM. When the device ts not enabled 
the 1/0 pins will go into a high impedance state that ef- 
fectively disconnects the multiplier from the system bus. 
As shown in Figure 2, the chip has input registers X-REG, 
and Y-REG that are written to through the single port 
bus interface. The product of these registers is then 
available for an accumulate operation on the next cycle. 
This ‘product’ may be added to or subtracted from the 
40-bit accumulator while the X register is simultaneously 
updated. The result in the 40-bit accumulator may be 
read 16-bits at a time: least significant 16, most signifi- 
cant 16, or high significant 16. The latter is produced 
only with repeated multiply-accumulates that create a 
result greater than 32-bits. Figure 3 provides details of 
the multiplier operation. Contro! of the input registers, 
output registers and accumulator operation is deter- 
mined by bits of the address bus (Ag-A3}. 


For a series of multiply-accumulate operations (such as 
an FIR filter computation), the device can operate as 
a two cycle pipetine (Write to X-REG and accumulate, 
Write to Y-REG), After the fast arithmetic operation, 
three read operations would be required to obtain the 
full precision output. A multiplier-aided-68000 or 8086 
will be approximately three times faster than an unaided 
68000 or 8086 microprocessor using only the internal 
multiply instruction. 


» MICROPROCESSOR INTERFACE 


The NCR45CM16 is easily interfaced to both the 68000 
and 8086 microprocessors. Typical interface circuitry 
for both micros can be seen in Figures 4a—4c, Examples 


2) 
BEre alg 


INPUT/OUTPUT PORT 







MULTIPLIER 
ARRAY 


CONTROL 


Figure 2. Functional Block Diagram 


of 68000 and 8086 assembly code used with the mul- 
tiptier are included at the end of this application note. 


NCR reserves the right to make any changes or discontinue altogether without notice with respect to any hardware or software product or the 


Bech ical corr 


-—~j 


= eee 


| 


po ee ee a 


=: 
{ 


| 


e 


i 


— , = 


Sara ay 
[ie XI | 


f 





NCR4S5CM16 


READ OPERATIONS (WE = 1) 


OPERATION 





“NOTE: Accumulator accumulates to 40 bits. Thus bits 0 - 39 are valid while bits 40 - 47 are a sign extension of bit 39. 


DIVIDE BY 2 AND READ (WE = 1) 


OPERATION 





X = Don't care 


*NOTE: Accumulator accumulates to 40 bits. Thus bits 1 - 39 are valid, while bits 40 - 48 are a sign extension of bit 39. 


WRITE OPERATIONS (WE = 0) 
[As[ Aa] Ay[ Ao] OPERATION [as [Aa] A] Aa) 
Clear Accumutator 
Retain X and Y 
Clear Accumulator 
Write new data to Y 
Clear Accumulator Retain Accumulator 
Write new data to X Write new data to X 
Clear Accumulator Retain Accumulator 
Write new data to X and Y Write new data to X and Y 























OPERATION 

Retain Accumulator 
Retain X and Y (NOP) 
Retain Accumulator 
Write new data to Y 


Add X« Y to Accumulator Subtract X * Y from Accum. 
Retain X and Y 


Retain X and Y 
Invalid Operation Invalid Operation 
Add X + Y¥ to Accumulator Subtract X * Y from Accum. 
Write new data to X 
Add X « Y to Accumulator 
Write new data to X and Y 


Write new data to X 
Accumulator 


Subtract X * ¥ from Accum. 
Write new data to X and Y 





Data latched into X-register 
Data latched into Y-register 





Figure 3. Read and write operations of the NCR45CM16 
are determined by 4 address pins (Agp-A3) and the 
write enable (WE) pin. 





NCR45CM16 


MC6800 to 45CM16 INTERFACE 








68000 45CM16 


DpOo-D15 pDo-D15 


RW 


A1-A4 


DECODER 


Figure 4a. 


Roses L_. 


—~ 


oan 


NCR45CM16 


sta°oa 


SLANDSP 


JOOW WNWIXVA 





FOV AYILNI SIWOGe © 9808 


tv-ov 


$1a-oa 


SLWOSe 


JGOW WOWINIW 


$3HDL¥1 
SS30dd¥ 





JOY AUALNI StWDSP 9) 9808 


y/10 

um 

qu 

ON 

a1v 

$1a-0a 

9808 

Bt oe a ew re 


NCR45CM16 


2 EXAMPLE: 


Using the NCR45CM16 for Assembly Language Multiplication/Addition — 68000 Application 


The NCR45CM16 can speed up compute-bound prob- 
lems on 16 bit microprocessors. One application that 
benefits from adding a 45CM16 is the computation of 
the sum of products: 


SUBROUTINE: Sum Products 

AO points to the first element in the X list 

A1 points to the first element in the Y list 

DO contains the number of products to be summed 


NCRAREA EQU XXXX 
e 8 ©6©Offsets for writes 
WXYCLRA EQU- $3 
ADDXYWX EQU $6 
WRITE_Y EQU $9 
@ Offsets for reads 
A_LOW EQU $0 
AMID EQU- $1 
A_HIGH EQU $2 
e Executable code: 
START MOVE .W #NCRAREA, A2 
CLR .WWXYCLRA » 2 (A2) 


LOOP MOVE . W (A0) +, ADDXYWX « 2 {A2) 


MOVE . W (A1) +, WRITELY * 2 (A2) 
DBF DO, LOOP 

MOVE . W DO,ADDXYWX * 2 (A2) 
MOVE . W ALLOW * 2 (A2), D1 
MOVE .W AMID ® 2 (A2), D2 
SWAP 02 

MOVE .W D1, D2 

MOVE . W A_HIGH * 2 (A2), D3 
RTS 


Of course, the subroutine can be made to execute even 
faster by using separate address registers to hold write 
and read locations instead of using offsets, But even in 
the above, register-conserving approach it is clear that 
using the 45CM16 to do the multiply-and-accumulate 
loop greatly reduces the overhead and shortens the code 
of the corresponding loop for an unaided 68000. With- 
out the 45CM16 a programmer would have to use the 
68000's own signed multiply instruction and a 32 bit 
addition even to accumulate to just 32 bits. This re- 
quires 82 machine cycles of execution time for the un- 
aided 68000 versus either 24 or 32, depending on the 
addressing mode, for the same operations done through 
the 45CM16 in a loop. 


The largest disadvantage to the unaided approach, how- 
ever, is the overhead required to do accumulation. With 


Result of product sum returned in low byte of D3 plus D2 
Define memory mapping for relevant 45CM16 instructions: 


A= (X1) © (¥1) + (X2) @ (Y2) 4+...4+ (Xn) © (Yn). 
Code for implementing this algorithm on the MC68000 
is given below: 


BASE ADDRESS FOR 45CM 16 1/0 


WRITE TO BOTH X, Y; CLEAR A 
ADD X «+ Y TO A; PUT NEW DATA IN X 
WRITE NEW DATA TO Y 


LOW WORD OF 40 BIT ACCUM, 
BITS 16-310OF A 
BITS 32-4? (40-47 EXTENDED) 


SEND 0’s TO X, Y, ANDA 
MULTIPLY/ACCUMULATE, NEXT X 
WRITE NEXT Y 


LAST MULTIPLY/ACCUMULATE 
FETCH LOW WORD IN ACCUMULATOR 
FETCH BITS 16-31 INA 

MOVE MIDDLE A WORD TO HIGH D2 
CONVERT TO SINGLE 32-BIT 

FETCH HIGH ACCUMULATOR WORD 


the 45CM16 at least 256 products can safely be added 
and the high byte of the 40 bit accumulator can be 
fetched at the end by the 68000. Without the 45CM16, 
adding 32 bit quantities in succession requires overflow 
checking and updating the bits in a second data register 
on each addition. This results in still further delay in the 
loop. 


In short, there is more code and more delay in the un- 
aided multiply-and-accumulate loop than in a similar 
loop executed through the 45CM16. With the multipli- 
cation accelerator, much code becomes unnecessary and 
the only additional code required for communicating 
with the 45CM16 is that which fetches the result from 
the accumulator at the end. 


oe rey rae = 


nw + oe 


renege pace aes rw oe 
! | 


me ----b 


a a 


NCR45CM16 


Using The NCR45CM16 For Assembly Language Multiplication/Addition on the 8086 


The NCR45CM16 MAC can be interfaced with the Intel 
8086 bus in two ways, memory mapped mode or [/O 
mode. In the memory mapped mode, the 45CM16 acts 
as a 16-bit wide RAM device connected to the 8086 bus. 
Data transfer between the 8086 and the 45CM16 is 
achieved by using one of the 23 different addressing 
modes available with the MOV instruction. !n the |/O 
mapped mode, the 45CM16 acts as a 16 bit wide I/O 
device or peripheral connected to the 8086 bus. Data 
transfer between the 8086 and the 45CM16 is achieved 
by using only two I/O instructions, IN and OUT. 


The advantage of connecting the 45CM16 in the mem- 
ory mapped mode is the availablity of a large number of 
addressing modes for data transfer. However, all of these 
modes, except one, have higher execution times than the 


simple IN and OUT instructions associated multipty/ 
accumulate operations on the 8086, the I/O interfacing 
mode is used in this example. Comparing the unaided 
8086 to the 45CM16 used in the 1/O mode indicates a 
speedup of approximately 3X for the multiply operation. 


The following assembly language subroutine is used to 
calculate SUM as 

SUM = X (1) * ¥ (1) + X (2) * ¥ (2) + 

X(3) © ¥ (3) 4 eas +X (N) * ¥ (N} 

The 45CM16 and data arrays are mapped into the mem- 
ory space 00 hex to FF hex, An Intel !ntellec IV devel- 
opment system was used to write the subroutine. 


| ! | V7 i ee Ce ed a 


ty Th 
aG03 :S) awnssy Oh 
‘ 6 
ee eee ‘ g¢ 
‘ HOIH * WAS \ ‘ L¢ 
SaLAa 9 \ QIW * Wns \ f 9¢ 
\ MO1 + WNS \ ‘ c¢ 
wae ne a en ee = : HE 
\ \ N ‘ €£ 
\ \ : Ze 
SSLAG \ a ‘ Ig 
NZ \ ae ‘ o¢ 
\ \ ‘ 6Z 
\ \ é ‘ 82 
\ AVaaY A \ I ‘ LZ 
aa + ee ‘ QZ 
\ \ N ‘ SZ 
\ \ ‘ hZ 
\ oad ‘ £z 
SSLAG \ ‘ 22 
NZ \ \ ‘ TZ 
\ \ Z2 f 02 
\ Ava xX \ I f 61 
ooo wees —--— --- ? ST 
SALAG Z \ N Veer eee se d@ él 
wen wena -+------ +--+ = -------- { gI 
aati SLIS. G1 sss 2555 >\ f SI 
‘ qT 
‘ ¢1 
3401S SI WOS 3HL 3YSHM SAISVIAVA = HOIH>WS “GIW *WoS “M07 >Wis ‘ ral 
daady ONY d3sIIdILINW 3a OL SAVadV = A ONY X ‘ Tl 
WIS SHL NI SNOLLYDITdILINW 30 YASWON = N ‘ 01 
‘ 6 
SNIiNOwsNS SIHL YOS GsayINdDSY VLVO 3HL 40 SS3udav é 8 
ASV@ SHL O1 43S1S199a d8 AO SNTVA SHL SLAS WYYdONd ONITIV) 3HL f l 
‘ 9 
NOFLYWAIANOD SIHi SNTYNG S3uNDDO MOWSASAO ON IVHL GaWnSssy 11 ‘ S 
sNDASCNDXHT FC ZA CZ IXFCT DA #1 X=WNS ‘ 4 
Sv SLONGOYdd 4O WIS SHL SALVYINDIWI ANILNONANS SIHL ‘ ¢ 
é 
‘ vA 
SL3NOOdd + 30 + WOS AWN T 
33g8NOS INIT rao 830301 


S1041NOD 3NIQ NOILVDOANI ON 
PaO ‘YON iT4 : NI G30V1d JINGOW 1D3ra0 
SLONGOYd > JO + WNS JINGOW 4O ATEWASSY O° IA YAIGWaSSV ONDWW 8808//808/9808 ILI-Salyas 


NCR45CM16 


NCR45CM16 


QgIWOSh OL X 3LIaM £ £8 
90v OL Ody anv ‘ 98 
A ONY X AldILTAW £ Xv ‘Xam + AXdGy ino Se 9344 2100 
Xv OL X LXEN Divi ‘ Lis] ‘xy AOW hg hOg8 STOO 
cf £8 
Zt+IS=IS ¢ IS ON I z8 Qh 4100 
H IS NI > LWad34 18 9h £100 
: 08 
I-x9=x9 ! xd 93a 64 6h 2100 
: BZ 
CIA 3LIUM ANY id 
QIWOSh 40 20V uvaTD ¢ Xv ‘TAUM +19 £NO gf 23/23 o1T00 
XV OL CIDA Divi £ [xat+1S] ‘xv AOW $d 0098 S000 
CI)X 3LEaM ONY hd 
9INDS 40 20” uvaTD XY “IXUM * 10 tno vi haa 3000 
Xv OL CIDX AW ‘ {1S] ‘xv AOW zd nogs voon 
: IZ 
‘ OZ 
Aaa X ‘ 69 
3O LN3WI15 1SaIs ‘ 89 
OL SANIOd IS MON $ ie NI Lg 94 6000 
Is ON 1 99 Sh 3000 
Nz=xg ¢ xa ‘xa adv 69 gazo 9000 
N=xa { x9‘x@ AOQW 49 6088 000 
N=xo ! L1s]‘xo AOW 69 2088 2000 
Ssauddy 3sva : z9 
3HL SOIOH IS ‘ da’ 1s AOQW 19 $488 0000 
‘ ag 
‘ 6S 
: 8S 
"LYOd3y GSSOIONS 3H1 NI NMOHS 3W3HOS SNIGODI0 O/1 JHL : LS 
NO G3SV8 Juv 3S3HL “9IWOSh YON 4O SNOLLVURdO SNOlaVA YO : 9¢ 
Gaainodsy S3ssauGdy 3H 3JN143G SLNAWaLYLS 3lvwnba 8 SHL ‘ SS 
‘ hS 
GOOTOOIIT Nod3 HOIH * dy < 4300 
GOLOOOTIT nda GIW + ay ZS 2300 
GO0O0OCI TI nova MO1 > GY 1S 0360 
AOO01GII1 no dON > AxGay 0S 8300 
GOLOOLIL1 nos AUM > JON 6h 2400 
a001IOLII nova XUM + AXIOW Bh 3300 
G0TGOOLII NOI LAMM #19 Lh 2400 
GOO1O0TI1 nd3z IXaM *V10 9h 4300 
‘ ch 
UVIN 0ud Wns nh 0000 
‘ €4 
INIWISS 3009 ah aie 
30unos aNI7 rao. 8-907 


aa side saad sis! eam | ae aa ame _ wim es ee —_ 


NCR45CM16 


pe i ae re, ee ee, eed 


QN3 


11zs9 OW ‘¥IlaWwn0D 
‘TYNOSSIW JO ALISYSAINN “SNITAS3SNIONA S3LNdWOD GNV Ww9141937a 
4O INSWLuvda0 LY AIWHIOD *d NWIAWA AG NALIYM SI aNILNOYENS SIHL 


ay a ee? a a a a, ee ey 


SONa 3ao02 
dQN3 Wns 
JNIANOYNENS WOYNS NANLaY £ 43a 
¢ 
HOIH*+WOnS tv ayOLsS £ Xv ‘L9+XE+1S] AOW 
Sil@ 91 HOIH ava ! HOIH *au ‘xv NI 
GIW*WAS iv 3xO1S ¢ XY ‘[ht+xXd+1S] AOW 
Siig 91 GIW avau ‘ GIW * Gu ‘x¥ NI 
MOT*WNS IV 3XxOLS £ Xv ‘LZ+xatiS] AOW 
SlIG@ G1 MO? dvau ‘ MO1> Gu ‘xv NI 
90”v OL aay ‘ 
ONY CNDA*(NDX O00 f xv “dON » AXddy ino 
0 OL TWwnda LON X93 f 
4] ivaday O105 ¢ 
ONY X2 IN3WSYND3G 4vada4u doo) 
é 
gIwoSh OL A LXSN 3LTUM £ xv “AUM *dON ino 
Pa 
XV OL A LX3N DW ! [X@+I1S] ‘xv AOW 


JIaNOS 


~. am! 


GNNO4 SHONYNS ON :3LIIdWOD ATeWSISSV 


Stl 
HIl 
Sit 
Zit 
Tl 
OIl 
601 
80l 
£61 
901 
sol 
HOT 
£OT 
201 
TOT 


£3 0£00 


90068 0z00 
h3S3 98200 
4O0h68 8200 
Z3Sa 9200 
Z00h68 £200 
03623 1200 


8323 ATOO 


44jZ3 AL00 
2423 S100 


cogs 6100 
PHO 301 


NCR45CM16 
» NOTES 


2 ao ee. ee moe et 


a 


ee 


il 


ee 


tt 





NCR45CM16 


12 





NCR Microelectronics Division 200% Banfield Ct. Fort Coilins, Colorado 80525 
Telex: 045-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 


ADVERTISERS’ EDITION 


z 
° 
-- 
< 
o 
— 
m 
= 
a 
z 
heb 
2) 
> 
q 
z 
C4 


et 
- ee 


ec 


a 
O 
S 
S 


NY 
¢ 


Sig i 4 
va 


OCTOBER 31, 1984 


4 
a 


RING BBANAGERS -— WORLOWIDE 


Go AND EWN 


FOR ENG! 


ic array : 


ingle-chi 
systol 


S 


. rs 2 


* 
A 

st; 
~ 


sing 


iImensions 


a 


to proces 


i 


oe 


* 
~~ 
ae 


oor 








BEHIND THE COVER 





depart completely from the traditional! 

von Neumann architecture is a heady under- 
taking. For a company involved in commercial mi- 
croelectronics for only three years, it might seem 
brash, but that didn’t faze NCR. 

“I’m not sure that the management believed in 
the systolic array processor all that much,” recalls 
Paul Sullivan, “but they let us go ahead.” Indeed, he 
admits to having no more than a passing interest in 
systolic arrays when the engineers at Martin 
Marietta brought up the concept in a 1982 meeting. 

But Sullivan, the head of advanced development 
at NCR’s Microelectronics Division, was soon con- 
vinced of the array’s importance, and his enthusi- 
asm proved contagious. The result —an architecture 
that links a number of processors handling data 
quickly in parallel—is the topic of this issue’s cover 
story (p. 207). 

Less than a year after the project got under way, 
first silicon rolled off the line. Although the initial 
array of 3 by 6 processors came in at a staggering 
200,000 mils? (the fabrication personnel thought it 
ridiculous to continue), it proved that the design 
was sound. Further, it furnished a breadboard to 
help assemble the final instruction set. 

Cutting the chip down to size—by half—while 
quadrupling the number of processors was no easy 
task. In fact, the 6-by-12-processor array was the 
first NCR part to incorporate a second layer of 
metallization, as well as the densest CMOS device 
the company ever ran through fabrication. “I'm stil] 
not sure that tackling both these firsts simul- 
taneously was a good idea,” confesses Sullivan. 

Finding suitable CAD tools for such a complex 
chip was a problem in itself, particularly because of 
the difficulty in checking design rules at the densi- 
ties involved. Also, the amount of information that 
made up the design data base was so massive that 
most machines employed in fabricating masks 
simply couldn't accept it. 

In a project this ambitious, teamwork was essen- 
tial. At the start, though, senior circuit designer 
Dave Thomas was about the only individual assign- 
ed full-time to the processor. He did stay in close 
contact with Martin Marietta’s Wlodzimierz Holsz- 
tynski—the mathematician responsible for the ar- 
chitecture of the processor elements. 


Cu one of the first processor chips to 


eee eres 


sic 


oe ee 





ee 


DESIGN ENTRY 





Systolic array chip 
matches the pace of 
high-speed processing 





A monolithic systolic array packs 72 single-bit 
parallel processors, letting it clip along at 
the rates demanded to process images in real time. 





This is the first in a series dealing with systolic 
arrays. Subsequent articles will investigate 
such applications as pattern recognition, image 
manipulation, and data-base management. 


processing, automated inspection, and 
artificial intelligence clearly reveal the 
limitations of the tradi- 
tional von Neumann ma- 
chine. Since that architec- 
ture handles only one piece 


Te herculean demands of real-time image 





Ronald Davis and 
Dave Thomas, NCR Corp. 


Ronald Davis is a product 
marketing engineer in charge 
of digital signal processing at 
NCR's Microelectronics Di- 
vision (Fort Collins, Colo.). He 
holds a Master of Science de- 
gree in electrical engineering 
from MIT and has worked for 
Bell Laboratories, as well as at 
Intel and IBM. 


Dave Thomas has worked as 

a senior design engineer in the 
Microelectronics Division for 
two years. Previously, he de- 
signed circuits for Texas In- 
struments and the Naval Avi- 
onics Center. Thomas holds 
both master’s and bachelor’s de- 
grees in electrical engt neering 
from Purdue University. 


Reprinted from ELECTRONIC DESIGN - October 31, 1984 





of data at a time, it severely constrains the 
speed with which information can be processed. 
Further, because incoming data must be held 
until it can be put to use, the approach inher- 
ently calls for a large amount of memory. 

Some problems are alleviated by turning to 
parallel architectures, in which micropro- 
cessors or semicustom devices are linked. The 
sheer number of com- 
ponents involved, though, 
makes the size of such 
systems an obstacle in its 
own right. In short, one set 
of difficulties is traded off 
for another. What’s more, 
parallel processing does lit- 
tle to cut back on storage 
space. 

A solution is finally at 
hand, in the form of the 
first commercial systolic 
array processor chip. The 
Geometric Arithmetic 
Parallel Processor (GAPP) 
overcomes the intrinsic 
problem associated with 
the von Neumann comput- 
er by loading 72 bit-serial 
parallel processor cells on- 
to a single IC (see “Systolic 
Arrays: The Heart of 
the Matter,”). Each of 


DESIGN ENTRY 
Cover: Systolic array chip 


the processor elements contains an ALU and 
128 bits of RAM, as well as bidirectional com- 
munication lines that connect the cell to its 
neighbors on the north, south, east, and west. In 
addition, a separate I/O communication bus al- 
lows data to be input from the south end of the 
array and output to the north without inter- 
fering with computation within the ALU. 

Central to the IC’s makeup is its single- 
instruction, multiple-data architecture, which 
distributes processing power among each of the 
identical bit-serial processor elements. Further 
boosting speed in a large number of applica- 
tions is the ability to cascade a number of ICs to 
form large arrays. 

Interestingly, the processor elements them- 
selves are not particularly fast, taking 2.5 us to 
add two 8-bit numbers. Executing 72 such oper- 
ations simultaneously, though, yields an over- 
all data rate of 28 million additions per second 
for each device. 

Assembling an array of chips also eliminates 
the bandwidth limitations that plague von Neu- 
mann machines. For instance, a 48-by-48-cell 
systolic processor —comprising 32 chips—can 


grab a 48-bit-wide word every 100 ns when oper- 
ating with a 10-MHz clock. The array’s band- 
width thus equals 480 Mbits/s. 

In pattern recognition and automated in- 
spection, the chip’s ability to handle entire 
images concurrently, instead of one pixel at a 
time, as is traditionally done, greatly speeds 
throughput. Since images are taken care of by a 
single chip, interactions with the host are elim- 
inated, as are the inherent restrictions of 
memory transfers using the system bus. And 
because a single cell can be mapped for each pix- 
el in an image, adding more chips pushes speed 
even higher. 


A juggling act 


The systolic array also is particularly suited 
to digital signal processing. Unlike single- 
instruction, single-data computers which can- 
not simultaneously calculate a host of basic 
operations (like multiplication, convolution, 
and trigonometric functions), the chip can per- 
form several of these common signal-process- 
ing operations concurrently. 

Additionally, to minimize the number of data 


Systolic arrays: The heart of the matter 


A systolic array is a regular arrangement of sim- 
ple, identical processor elements that are connected 
to their nearest neighbors. The term “systole” was 
originally used to refer to the recurrent contractions 
of the heart. As with the human circulatory system, 
systolic computations are characterized by the 
pumping of data through an array of processor ele- 
ments. While data moves in and out of the processor 
element, some operation is performed on it during 
each cycle. This maintains a regular flow, or circu- 
lation, of data within the network. Although defini- 
tions vary, systolic processors must first of all run in 
sync with a global system clock, so that data is rhyth- 
mically computed and passed through the network. 

An array can be extended arbitrarily by con- 
necting two or more processor elements to increase 
speed linearly with the number of elements. A good 


measure of the efficiency of an array processor vis-a- 
vis a single processor is the so-called speed-up factor, 
which is defined as the processing time for a single 
processor divided by that of an array. 

The systolic architecture is a natural one; it is a 
subset of the cellular automaton—a uniform array of 
many identical cells in which each cell interacts only 
with its neighbors. Interestingly enough, it was the 
father of conventional computer architecture, von 
Neumann, who performed some of the earliest in- 
vestigations into the cellular automaton structure as 
a potential machine configuration. Harbingers of 
today’s systolic arrays are the Iliac IV system, devel- 
oped in the late 1960s, and the massively parallel 
processor built by Goodyear Aerospace in 1981. 
Systolic chip architectures were developed at 
Carnegie-Mellon. 


i 


——~aeb 





t 
t 
Geese peer ween 


euminsl aoneien! 


fetches normally required, reads to or writes 
from RAM can be performed on every cycle—at 
the same time as computations. That ability is 
significant in real-time processing and many 
number-crunching applications, which are by 
nature memory-intensive. It also lends itself to 
artificial intelligence, in which volumes of data 
requiring extensive parallel processing are the 
rule rather than the exception. 

The chip operates at 5.0+0.5V and dissipates 
500 mW ata 10-MHz clock rate. Data setup and 
hold times are 10 and 5 ns, respectively. Al- 
though 72 processors are now riding on the 
100,000-mil* chip, future versions will go far be- 
yond that. The 6-by-12 array is now fabricated 
with a 3-um double-layer meta! CMOS process, 
but shrinking line widths to 1 wm will crowd 512 
cells onto an equal-sized chip. Clearly, such den- 
sity makes CMOS the technology of necessity. 

The array is housed either ina ceramic 
84-lead pin-grid array or a plastic chip carrier 
(with the same number of contacts). As the 
number of processing elements is increased to 
512, only 162 pins will be needed, not quite dou- 
ble the number now employed. 


A peek inside 


Each processor element contains separate 
lines that link the cell to its neighbors and tothe 
outside world. In addition to the North South 
(N/S) and East West (E/W) lines that pass 
data between cells, are the CM South input 
(CMS) and CM North output (CMN) (Fig. 1). 
There is also a complement of 22 external signal 
lines: 7 address lines (Ay through Ag), 13 control 
lines (Cy through C2), one global output (GO), 
and one clock (CLK). 

The chip’s overall simplicity is reflected in 
the layout of a single processor element (Fig. 2). 
Each of its four latches—CM, N/S, E/W, and C 
(referred to as the C register)— accepts data 
from up to eight possible sources, depending 
upon the setting of the control lines. Cy and C, 
control the input to the CM latch; C, through C, 
govern the input to the N/S latch; C, through C; 
manage the input to the E/W latch; and Cz 
through Cyo, the input to the C register. Lines 
C,, and C,, handle reads to and writes from the 
128-bit RAM. 

Working from a truth table, the array per- 
forms additions and subtractions (Table 1 ). The 





C, NS, and EW inputs to the multiplexers repre- 
sent the contents of the C, N/S, and E/'W reg- 
isters, respectively. The summing output of the 
single bit ALU, SM, goes directly to the RAM 
and may also be simultaneously input to any of 
the four registers. The Carry and Borrow out- 
puts (CY and BW, respectively) are open to the 
C register. A truth table is used as well to fulfill 
single- and dual-input logic functions—logical 
complement, exclusive-OR, exclusive-NOR, 
logical AND, and logical OR—on data in the 
N/S and E/W latches (Table 2). 


What are your instructions? 


The chip is programmed with a sequence of 
instructions that, when compiled by an assem- 
bler, directs the appropriate control signals to 
every cell in the array. Up to five commands, 
one from each of the five groups that make up 
the overall set, can be executed simultaneously 
on every instruction cycle. The possible combi- 
nations of horizontally microcoded instruc- 
tions results in nearly 6000 commands (see 
Table 3). 

Images are manipulated by the array pro- 
cessor at a brisk pace. A 3-by-3-pixel mask with 


1-bit ALU 


4 registers 


RAM 
{128 bits) 


1. individual processor elements link to their next- 
door neighbors on the north, south, east, and west 
over bidirectional communication lines N/S and E/W, 
respectively. The control lines C, through C,2 and ad- 
dress lines A, through A, establish the signal paths to 
the outside world. Each cell contains a 1-bit ALU, four 
registers, and a 126-bit RAM. 


DESIGN ENTRY 


Cover: Systolic array chip 





an 8-bit gray scale can be convolved with an 
8-bit gray-scale image in less than 300 us, using 
what is termed a global broadcast operation. In 
that mode, a single bit (1 or 0) is transmitted to 
the C register of every processor in the array 
(by toggling control line Cs). Each processor 
will perform the same function on the broad- 
cast data, thereby increasing throughput. Since 
mask data can be broadcast globally, every pix- 
e] in the mask can operate simultaneously on 
the entire image. Thus, the 3-by-3 convolution 
can be accomplished with nine sets of global 
operations that include multiply, shift, and add. 


Similarly, a 9-by-9-pixel mask with an 8-bit 
gray-scale image can be convolved with an 8-bit 
gray-scale image in less than 5 ms. The 8 bits of 
gray scale furnish up to 256 shading intensities 
(from black to white)— better than the video 
signals coming from most TV cameras. 


To the north! 


As mentioned, the processor’s speed can be 
enhanced by linking several chips to create 
larger arrays. Moreover, doing so does not re- 
quire any changes in software. In addition, the 
systolic array has a Data Communication line 


Multiplexers 


Full 
adder, 
subtracter 





2. The layout of a single processor cell mirrors the simplicity of the 
overall array design. Each of the four registers, CM, N/S, E/W, and C, 
accepts data trom up to eight sources. The settings of the control lines 
determine which information is sent to each register. 


cessing, the parts that make up an image plane 
can be shifted to the center of the convolution 
window before the required multiplication is 
performed. That increases efficiency because 
two 8-bit integers form a 16-bit product. If the 
shift were not carried out first, the product 
would have to be shifted bit-seriaily to the cen- 
ter of the window. 

At the end of the convolution, the chip will 
have performed a total of nine global 8-bit mul- 
tiplications, twelve shifts, and nine 16-bit addi- 
tions. The approximate execution time for each 
multiplication is 25.2 us; for a shift, 2.4 us; and 
for addition, 4.9 us. Thus a total of just 299.7 us 
is expended on the operation. 

A binary correlation mask using a binary im- 
age is conceptually identical to convolution, but 
bit-wide exclusive-OR operations are worked 
with instead of multiplications. The correlation 
creates a level of comparison, so that a decision 
threshold can be established to determine 
whether a match is close enough to meet system 
requirements. A score of 441 denotes a perfect 
match; 0 indicates an inverse image. Thresh- 


. €W: = RAM 0: C: « 0/Load Pattern into E/W Ragisters 

. EW: = €/Shift Right, Then Load into E/W 

. NS: = EW EW: = E/Load Shifted Pattern into N/S 
Then Shitt Aight and Load into 

. EW: = RAM 0; 
BW Is N/S-E/W 


. C= CY 
. RAM 1: = C/Load Output Drive 


Esw 
C: = BW; NS: = 0/Retoad Original Pattern, 


3. A simple six-step algorithm allows the array pro- 
cessor chip to recognize s 101 pattern. When the chip 
accepts an “X blank X” (or 101) pattern from a camera 
(a), it indicates a match by putting a 1in the output 
plane (b). Every 1 in the output plane indicates the 
first X of the pattern. 





olds can be set at any level in between to deter- 
mine pass or fail. 

Processing a 21-by-21-pixel binary cor- 
relation mask takes one exclusive-OR, 1536 
shift operations, and 400 additions. The 
exclusive-OR and shift operations take 300-ns 
apiece, and additions take 1.6 us. Total exe- 
cution time is thus 1.3 ms. 


Moving pictures 


In an image-processing system, a series of 
chips can be combined to perform various func- 
tions. For example, a Multibus-based setup can 
be built around an arrangement of 48-by-48- 
processor cells that store and manipulate 
image data (Fig. 4). Incoming video data is 
temporarily stored in a row of eight GAPP de- 
vices that serve as a line buffer. After an 8-bit 
analog-to-digital converter processes the im- 
age, the corner-turning row, as it is called, ac- 
cepts an 8-bit gray scale value for each pixel. 

While the camera is horizontally retracing, 
the video data is shifted from the buffer into the 
processing array. The first step of this shift is 
the corner-turning operation, which is per- 
formed by switching data from the EW latch 
into the CM latch of the line buffer. As each 
pixel’s 8 bits are clocked out of the video line 
buffer, they are transferred into the bottom 
row of processing elements, where they are 
stored in the internal RAM at addresses 0 
through 7. At the same time, the previous video 
line is shifted up one row. The entire operation 
requires 18 instructions, well within the array’s 
speed limitations. Operating at a 10-MHz clock, 
the chip can execute as many as 120 instruc- 
tions in the 12-us horizontal retrace period of a 
typical camera. Once the entire video frame has 
been loaded into the array, computations can be 
performed. 

The end result of these computations is a pro- 
cessed video frame that is held in the internal 
RAM of the cascaded arrays. Data can be sent 
from the arrays using the same instructions 
that were followed to load it. 

A feature that significantly increases 
throughput is the ability to transmit data and 
load in a new frame concurrently. For instance, 
if frames A and B are loaded into the array, the 
resultant frame, A’, is being computed while a 
third frame, C, is being loaded. Similarly, as A’ 


DESIGN ENTRY 
Cover: Systolic array chip 


known as dilation: The image is shifted east, 
then an OR operation is performed with the 
original image. Then the image is shifted west 
and ORed, shifted north and ORed, and shifted 
south and ORed. This operation enlarges the 
width of all edges. Individual missing pixels are 
filled in to provide a continuous border. Larger 
gaps can be filled by shifting the image two or 
more units. To determine the true edge, the pro- 
cessor then erodes the image by shifting and 
performing an AND operation, effectively 
eliminating all the blocks it created during 
dilation except for those determined by the al- 
gorithm to be part of the actual image. 

Since an exact match between the object be- 
ing viewed and the image stored in memory is 
rare, convolution and correlation are necessary 
functions to determine how close the match ac- 


tually is. In the simple 101 pattern given earlier, 
perfect matches are simple to demonstrate. In 
the longer strings found in real-world pattern 
recognition, convolution and correlation adjust 
for minor discrepancies. 


Convolution and correlation 


Convolution is employed in edge enhance- 
ment, for instance, toimprove the quality of the 
image. It also calls on the array’s ability to 
handle global broadcasts. In convolving a 3- 
by-3-pixel mask with an 8-bit gray scale, the 
mask is placed over every pixel in the image and 
the product terms in each 3-by-3-pixel window 
are summed. 

Global broadcasting lets the system send a 
single portion of the 3-by-3-pixel mask to each 
of the cells within the array. To speed pro- 


Table 3. instruction set for the systolic array processor 


Control lines 
Cae Cys Cg Cy Co Cy Cy 


Ci2 Cry Cyp Cy Cy Cz 


x 
x 
x 
x 
x 
x 
x 
x 
x 
X 
x 
x 
xX 
x 
Xx 
x 
x 
x 
xX 
x 
xX 
x 
x 
x 
x 
xX 
0 
0 
1 
1 


=O =O; Km KM MK MK WK OM KOK OK OK OK OKO | Om OK OK OE KO 
MxM KK eH OOOO! WK MM KRM KK KKM KKK KE OK OK 
xxM MK) AAO Om et OO] WK MK KK OK KK KK KK KOK OK 


x xKXxM KHAO mw OBMA OBO] MK KK KKK) KKK OK KK | OO OK 


MK mM] KK KK or OOO Ol KKM KK KML OK 


Micro NOP 
Load CM from RAM 


Move from CMS 
into CM 


Load zero into CM 


Micro NOP 

Load NS from RAM 
Move from N into NS 
Move from S into NS 
Move from EW into NS 
Move from C into NS 
Load 0 into NS 


Micro NOP 

Load EW from RAM 
Move from E into EW 
Move from W into EW 
Move from NS into EW 
Move from C into EW 
Load 0 into EW 


Micro NOP 

Load C from RAM 
Move from NS into C 
Move from EW into C 
Load C from Carry 
Load C trom Borrow 
Load 0 into C 

Load 1 into C 


Read from RAM 
Load ARAM from CM 
Load RAM from C 
Load RAM from Sum 


oun Co 


+ oe ee eee eo ae oe oe me oe me mee 





MK MM KM KKK KL BR OOU UM COO] KM KK KK KIER 
x mM MK) KR MK KKK OOO] KKK KL KO x 
mM MMM | KK KK KO KKK KOK OOO OlLM MK 
mM MM KK KO KOK OO U UM OO] KKK 
mK MM MK KK OL KKK KOU OUM oO! KKK 
mK MK OK | KK KO OK OKO OOK OK OKO EK OO OKO OK OK |e 


ee re ee | 


eed reel , Deitel 


en ee ee eve oe | 


2h 





DESIGN ENTRY 


Handling real-time images 
comes naturally 
to systolic array chip 


The internal memory and specialized algorithms 
of a systolic array IC cut the amount of hardware 
and boost the speed associated with image processing. 





This is the second in a series focusing on the 
first commercial systolic array processor chip, 
developed by NCR Corp.’s Microelectronics 
Division in Fort Collins, Colo. The opening 
article was the Oct. 31 cover story (p. 207). 
Upcoming discussions will investigate the de- 
vice’s use in pattern recognition, data-base 
management, and as an associative processor. 


has been a difficult task, calling for a 

large amount of hardware. Most high- 
performance systems comprise a frame buffer, 
which stores the incoming image; a high-speed, 
pipelined processor to carry out the needed al- 
gebraic manipulations; and a second buffer to 
retain the processed image. Although inter- 
leaved sequential memory accesses in such 


U ntil recently, real-time image processing 





Wyndham Hannaway, G.W. Hannaway & Assoc. 
Gary Shea, Consultant 
William R. Bishop, Consultant 


Wyndham Hannaway heads G.W. Hannaway & Asso- 
ciates, a technology consulting firm in Boulder, Colo., 
specializing in optics, imaye processing, and simula- 
tion. He helped create the boards for the systolic array 
and the alyorithms for tmage processing. 


Gary Shea is an independent consultant in image- 
processing software. He holds a BS in mathematics 
from the University of Colorado, 


William R. Bishop is a consultant in image processing 
at G. W. Hannaway & Associates. 


Reprinted trom ELECTRONIC DESIGN - November 15, 1984 


setups make it possible to load and unload the 
buffers rapidly, the bandwidth of the memory- 
processor bus limits throughput. Furthermore, 
some image-processing algorithms require 
several fetches for each pixel, further cutting 
into overall] system speed. 

The Geometric Arithmetic Paralle! Pro- 
cessor (GAPP) chip overcomes these obstacles 
by supplying an array of 72 parallel bit-serial 
processor elements, each of which is fitted with 
128 bits of RAM. This configuration lets system 
designers dedicate an individual processor ele- 
ment to every pixel. To cut costs, though, many 
systems could handle smal! groups of pixels or 
subimages serially, assigning more than one 
pixel to a processor element, or cell. In fact, the 
systolic array can be viewed as a combined 
frame buffer and processor, bringing a bit- 
mapped an image into its RAM, processing it, 
and then putting it back in RAM before sending 
it out. One example of the chip’s prowess is its 
ability to store two images in its RAM and then 
deliver the difference between them. For design 
considerations, the monolithic array can also 
be considered a highly pipelined, parallel 
processor. 

Since the chip departs substantially from the 
conventional von Neumann architecture, 
image-processing systems based on it must 
vary from the usual as well. To demonstrate 
these differences, it is necessary to briefly ex- 
amine the traditional approaches. One, for in- 


DESIGN ENTRY 


Cover: Systolic array chip 





is being sent, the next frame can be loaded. In 
this case, once three frames are loaded, real- 
time pipelined processing is obtained. The set of 
three 16-bit latches multiplexed onto the Multi- 
bus board also lets the host exchange data with 
the system. Finally, information can also be de- 
livered to the line buffer and sent to a video 
monitor using a d-a converter. 

The implementation of a control store lets 
the arrays receive a set of instructions from the 
host and store them, freeing the host for other 
tasks. The store operates in conjunction with a 
sequencer that watches for and maintains the 
correct sequence as the arrays perform their in- 
structions. 

A system such as this can also be imple- 
mented as a workstation for developing 
software meant for the processor chip. An 


Multiplexer 
South 


elements 
(32 systolic processor chips) 





upcoming software simulator and assembler, 
running in conjunction with the workstation, 
will allow users to load input data into a group 
of arrays, run through a sequence of instruc- 
tions, and transmit the results back to the host 
computer. Additionally, a software library of 
macrocells will form the basis of a high-level 
command set for the processor.O 


Acknowledgment 

The Geometric Arithmetic Parallel Processor was de- 
veloped in conjunction with Martin Marietta Aerospace 
Corp. for use in real-time target recoynition. 


. Instruction 
processor control 


store 


4. A single-board system can be built around two blocks of array processor chips. One block 
serves as the processing unit; the second, as the line buffer. The control store retains the com- 
mands from the host, freeing it to carry out other tasks. 


If 


“3 
U 


since these chips can compute while handling 
the serial-to-parallel shift. Regardless of 
whether systolic arrays are used, the chip’s 
memory associated with each processor ele- 
ment allows it to simultaneously store up to 16 
images of 8 bits each, obviating the need for 
frame buffers. 


Quicker than the eye 


Once the architecture of the image-process- 
ing system is selected, the next concern is decid- 
ing on the number of systolic array chips (see 
“Welcoming Aboard the Systolic Array,” 
p. 293). When speed is the primary concern, a 
one-to-one relationship between processor ele- 
ments and pixels can be established. A block of 
512 by 512 processor elements, made up of about 
3700 chips, can perform 100 billion 8-bit addi- 
tions a second. In the thirtieth of a second it 
takes to bring in a typical television frame, 
every cell can execute 13,333 8-bit additions or 
333,335 primitive single-cycle instructions—for 
more than the number demanded by many real- 
time image-processing algorithms (see 
“Systolically Altered States,” p. 294). 

Thus instead of a simple 1:1 ratio between 
processor elements and pixels, a system might 
dedicate one element to a number of pixels and 
thus process data in the form of windows. When 
one window is completed, processing can begin 
on the next. 


Beat the clock 


In asyster. involving a real-time algorithm, 
which does not require the use of previous im- 
age frames, the entire 512-by-512-pixel image 
need not be in the GAPP array all at once, 
thereby cutting the number of devices required. 
In a so-called neighborhood processing algo- 
rithm—one that determines the next value of a 
pixel by comparing it with the pixels surround- 
ing it—a block of 24 by 516 processor elements, 
consisting of 172 systolic devices, can carry out 
600 additions on every pixel while operating at 
10 MHz—far more processing power than 
available with conventional architectv es. 

Since less hardware is used, the necessary 
program may be larger and more complex than 
that found in architectures devoting one pro- 
cessor cell to every pixel. Despite such differ- 
ences, the algorithms share many attributes. In 





this set up, each pixel is stored in internal RAM, 
and although it might first appear that 128 bits 
of image data can be held in memory, the need 
to retain operands and intermediate results 
and to flag overflows reduces the chip’s capaci- 
ty somewhat. As in the first configuration, the 
number of systolic devices can be boosted or cut. 


A different point of view 


Programming the systolic array is radically 
different from programming a traditional mi- 
croprocessur. The first is a single-instruction, 
multiple-data path (SIMD) machine; the sec- 
ond, a single-instruction, single-data path 
(SISD) device. For that reason, code for an ex- 
isting chip cannot simply be converted: Writing 
software for the systolic chip demands a new 
way of looking at both the task and the neces- 
sary algorithm. 

To facilitate programming the systolic pro- 
cessor, a simulator that runs on personal com- 
puters has been created. Written in C, the soft- 
ware runs under Unix and operates NCR’s PC-4 
and on the IBM PC XT as well as on larger 
systems like the Digital Equipment PDP-11 


Video line 
buffer 
(8- to N-bit 
parallei-to-serial 
shift register) 


serial-to-paraliel 
shift register) 


2. A video line buffer, which stores a full input line 
trom the camera, can be made up of either shift reg- 
isters or systolic array chips. The 128 bits of RAM 
included for each processor element in the GAPP 
biock eliminate the need for frame buffers. 


DESIGN ENTRY 


Systolic image processor 





stance, relies ona pipelined ALU, with separate 
frame buffers for input and output. Pipelining 
joins a series of processor elements to perform 
sequential arithmetic operations on a continu- 
ous data stream. The method is good with pro- 
cessors that range from bit-slice devices to 
supercomputers. Nonetheless, even the latter 
can perform only from 20 to 100 operations on 
each pixel to sustain a real-time rate of 10 mega- 
pixels/s, the rate of standard video systems. 

The systolic array can drop into such an ar- 
chitecture (Fig. 1). With 32 of the chips joined 
together to create a grid of 48 by 48 processor 
elements totaling 2304 processors, up to 60 mil- 
lion pixels/s can be accepted, even with a gray- 
scale depth of 8 bits a pixel. Since data can be 
loaded over the chip’s communication (CM) bus 
at the same time that it is processed, the grid 
array can operate at full speed at all times, 
chewing up 920 million macroinstructions ev- 
ery second. (A macroinstruction is defined here 
as an 8-bit addition that can be executed in 25 
cycles, or 2.5 ws.) Linking together more chips 
further increases processing power. 

Despite its impressive speed, the architec- 
ture is not optimal for the systolic processor be- 
cause data must be reformatted to work with 
the array. The chip works with information in 
the form of bit planes. Asa result, an 8-bit num- 
ber representing the pixels must first be re- 
formatted as a bit plane. The first bit plane rep- 
resents the least-significant bits. Once in the 
array, the whole plane is written to one location 


within the internal RAM of each processor ele- 
ment. The next seven bits must be loaded simi- 
larly, but such reformatting 1s too complex for 
most frame buffers. 


Shifting into first 


To overcome this hurdle, a designer can turn 
to serial-to-parallel shift registers long enough 
to store one full video line (Fig. 2). During the 
horizontal! retracing period of the television 
signal, the previous video line is shifted into the 
edge of systolic arrays, which can consist of any 
number of chips. The least significant bit of 
each pixel in the line is shifted into the bottom 
row of processor elements and written into 
RAM address 1. The next most significant bit is 
then shifted in and written to RAM address 2. 
The process continues until al! eight bits of ev- 
ery pixel line have been loaded into RAM ad- 
dresses 1 through 8 of the bottom row of pro- 
cessor elements. 

Each RAM location of the block is read into 
the CM register before each shift into CM from 
the south (CM=CMS), so that the first video 
line is shifted up and written into the adjacent 
row of processor elements when the second line 
enters the bottom row of processor elements. 
Once the grid is filled, the same process occurs 
as the image is unloaded to the north and sent to 
the output video line buffer. 

The line buffers can be designed with either 
shift registers or with systolic array devices. 
The latter approach enhances performance, 


Geometric 
Arithmetic 
Parallel 


Processor 
{32 chips) 





1. A Geometric Arithmetic Parallel Processor can be substituted for traditional mi- 
croprocessors in a pipelined architecture. The arrangement requires the memory 
to be very wide, and data to be reorganized. It is thus better to reconfigure the 
architecture to take advantage of the chip’s properties. 


! 
I 


= 


ed ined ee) eet te men enna =k 


‘ y=, Gi 








Since the Geometric Arithmetric Parallel Pro- 
cessor differs so radically from traditional pro- 
cessors, a number of aspects of design must be con- 
sidered when a pc board is laid out. Foremost among 
these are the communications lines that join a block 
of systolic processors. 

No support circuitry is called for between the 
chips, which themselves are easily linked to their 
neighbors to the north, south, east, and west. In that, 
they resemble an individual processing element 
within a single array, which is joined to its four 
nearest neighbors. Further, the 84-contact packages 
are readily connected since the North output of one 
IC is physically adjacent to the South output of an 
adjoining chip. The East and West ports are similar- 
ly compatible. 

Terminating the outer edges of a block of arrays 
demands a variety of techniques, depending upon 
which algorithm is being executed. That presents no 
problem, though, since a programmable multiplexer 
can switch from one termination technique to an- 
other, under software control. 

On the one hand, the edge connections can be 
grounded during input cycles so that all shifts bring 
in zeros from the outer edge of the block. Alterna- 
tively, the edges may be tied to a data bus for I/O. 
A third approach 
brings the connec- 
tions from the east 
and north around to 
those of the west and 
south, respectively, so 
that data is recycled. 

These connections 
can be made without 
concern for loading 
and fan-out, since 
they involve only the 
processor elements at 
the edge of the group 
of chips. Control, ad- 














































Weicoming aboard the systolic processor 



























































dress, and clock signals, however, must be bused to 
each device in a grid of chips. In wraparound lay- 
outs, synchronization is critical between the clock 
and control lines at the edges of the block. 

When large blocks of the chips are grouped to- 
gether, it is generally best to drive them in groups 
of less than 40 chips. Driving more chips can skew 
timing and may exceed the power capabilities of 
driver chips. The routing for this type of bus is 
best laid out using an H-shaped topology (see the 
figure). 

When a number of chips are being clocked syn- 
chronously and driven in parallel by command 
drivers, power distribution must be uniform. There- 
fore, boards using wire-wrapped interconnections 
should have full surface power and ground planes. 
Inattention to the capacitive details of coupling and 
ground planes can cause undershoot and overshoot 
of signals. To supply a new contro] word every 100 
ns, keeping pace with the device's 10-MHz clock, a 
20-bit-wide instruction queue for both the control 
and address lines is needed. Most designs, however, 
should include 24 or more extra bits to ensure space 
for control functions and looping. Static RAMs are 
the simplest to use for this; however, for high speed 
2k-by-8-bit RAMs are preferred. 

The instruction 
queue in a system 
based on a systolic ar- 
ray is driven by high- 
speed address sequen- 
ces. The four extra 
bits in a 24-bit-wide 
instruction queue can 
be used to control 
jumps and loops of an 
address sequencer. 
The Global Output 
signal from the array 
can serve as a flag for 
conditional jumps. 


















Geometric Arithmetic 
0D Parallel Processor 


Bus inputs: 
Clock, Control (Cy-C 42), 
Processor Element RAM address (Ro-Re) 











Y . A 
Yi Oiap ted 






DESIGN ENTRY 


Systolic image processor 


and NCR’s Tower 1632. 

Although the advantages of simulating oper- 
ation while the hardware is being designed are 
obvious, it must be noted that running the array 
program on a single-instruction, single-data- 
path computer will be very slow. A task exe- 
cuted as a single instruction on a systolic array 
will require at least N’ operations when it runs 
on aconventional processor, where N equals the 
number of processor elements along one axis of 
the array. 

Consider the addition of two 8-bit, 512-by- 
512-pixel images. A 10-MHz, 8-bit processor 
needs at least 1 second to do the job. As men- 
tioned earlier, a grid of 512 processor elements 
could perform the same function in 25 cycles, or 
about 2.5 us. 


Breaking with convention 


Another factor that must be considered when 
the simulator runs on a traditional computer is 
the relationship between the memory and a 
processor. A conventional processor passes 
data between itself and memory. The systolic 
array, in contrast, has the aforementioned 128 
bits of RAM associated with each cell, and every 
memory address holds one bit. Consequently, to 
speed the simulator’s operation while simplify- 
ing the development of algorithms, the size of 
the grid should generally be kept down to 6 by 6 
or 12 by 12 processor elements. Fortunately, 
software written for a small array can run 
without modification on larger arrays. 

Once a satisfactory set of algorithms for a 
particular job is complete, an assembler con- 
verts the mnemonic code created by the simula- 
tor into binary instructions for the target ma- 
chine. The assembler produces a binary file 
that can be loaded into a high-speed, 20-bit- 
wide memory, dubbed the instruction queue. 
The queue holds the algorithm for execution at 
the frame rate selected for the system. Ina real- 
time system, say, data comes in and processed 
data goes out simultaneously. As the algo- 
rithms run, a complete loop through the in- 
struction queue is repeated for every new frame 
passing through the grid. 

The kinds of algorithms that must be devel- 
oped for image processing are, of course, direct- 
ly tied to both the specific demands of such 
processing and to the way the array works. 


Image-processing computations are more dis- 
tinctly parallel than those of scientific and 
business calculations, in which memory use and 
the operations performed are far more random. 

The speed with which the systolic array han- 
dles such parallel chores can be clearly seen by 
again comparing the array to a traditional pro- 
cessor. A von Neumann machine requires on 
the order of N X N cycles to process an N X N 
pixel image. That interval is expressed as 
O(N’), which is short for “order N squared”. The 
systolic array needs only O(N), or even O(k) 
cycles, where k equals either the number of 
bits per pixel or the number of digits used in the 
calculation, to process the same image. 

When the array processes an image, each 
element is active simultaneously, so the time 
needed to subtract one image from another is 
independent of the size of the image. Algo- 
rithms for the primitive operations of image 
processing—adding and translating an image 
along an axis and manipulating the gray scale 
—can be performed in O(k) time. Furthermore, 
operations that normally occur within the indi- 
vidual registers of a von Neumann processor 
(bit inversion, bit setting or resetting, and bit 
shifting) are easily handled in O(k) cycles by 
the systolic array. 


Nothing to it 


Other algorithms handled just as readily by 
the device are those requiring information 
about the four or eight neighboring pixels. A 
4-neighborhood algorithm can be defined as 
one using the north, south, east, and west pro- 
cessing elements of a particular portion of an 
image. The eight-pixel neighborhood consists 
of those four plus the northeast, northwest, 
southeast, and southwest cells. Such algo- 
rithms include 3-by-3-pixel convolution, a 
3-by-3-block pattern matching, and various 
types of erosion and dilation. Al] of these are 
classified as local algorithms, since they do not 
require information from any elements other 
than their immediate neighbors. 

Global algorithms, on the other hand, like 
histograms and correlations, need information 
from more distant elements. They take O(N) 
time, much faster than the time demanded by a 
traditional computer. 

Certain fundamental operations are common 


ations. Thresholding determines which pixel 
values are greater or less than a predetermined 
level. In an application that needs to zero (that 
is, ignore or turn into zeros) all the pixels witha 
gray-scale value of less than 20, the first step is 
to make a copy of the image’s data base, which 
is destroyed as the task is carried out. 

Since a 6-bit field can represent numbers 
from 0 to 63, adding 44 to every pixel will cause 
all those with values greater than 19 to over- 
flow. The overflow bit plane must then be in- 
verted to yield a zero overflow bit in every pixel 
where that occurred. If the inverted overflow 
bit is then ANDed with the original fields, all 
the pixels that overflowed will have their fields 
zeroed. 

The entire task can be rapidly performed by 
using a global broadcast, which simultaneously 
places a given value (in this case, 44) in every 
processor element in the array. Obviously that 
is faster than moving the data through the ar- 
ray until it has reached each processor element. 
To place the binary value 101100 into RAM lo- 
cations 21 to 26 in every element, the following 
instruction sequence would be executed: 


C:= 
RAM26:=C, C = 0 
RAM25:=C, C = 1 
RAM24:=C, C = 1 
RAM23:=C C = 0 
RAM22:=C,C = 0 
RAM21:=C 


Another chore common to image processing, 
finding the maximum pixel value in an image, 
lends itself to the architecture of the systolic 
array. A number of algorithms could be used, 
depending on the desired objective. One takes 
advantage of the chip’s Global Output (GO) line 
to furnish the value of the highest-intensity 
pixel (MAXVAL) within a O(k) interval (Pro- 
gram 1). 


Once the algorithm is completed, the pro- 
cessor elements wth the maximum intensity 
value will have a logic 1 stored in their EW reg- 
isters. The same algorithm can also determine 
the value of the lowest-intensity pixel (MINVAL) 
by first making a negative from the image, 
which is accomplished by simply inverting each 
bit of the pixel. 

In some instances, it is desirable to determine 
the location of the highest-intensity pixels. The 
only additions needed are a bit detector (a sim- 
ple comparator) and another algorithm (Fig. 3). 
The comparator simply accepts inputs from the 
array until a logic 1 is picked up. It then sends 


form = 1toM do 
forn = 1toNdo 


EW:= E 
it bit_detect = 1 


pixel_location « m,n 
} 


ne 
Controller 


* 
On interrupt. 
send m and n 
to the hest 


to east inputs 





3. By running a specific algorithm, a comparator 
serving as a bit detector can determine the location 
of pixels with the greatest gray-scale values. When a 
logic 1, which denotes such pixels, is observed, the 
controller is interrupted and sends the location of 
the bit to the host. 


DESIGN ENTRY 


Systolic image processor 





to both local and global algorithms. One such 
operation, or building block, is overflow detec- 
tion, which is used for many tasks. 

One approach to it conjoins a 1-bit field with 
each field to be operated upon. Adding a field of 
3 bits and a field of 5 bits will probably cause an 
overflow if it is delivered to a 3-bit field, so a1 
will be placed into the overflow field. The result- 
ant image provides useful information about 
the data being processed. For instance, the 
overflow bit may be used to generate a visual, 
cue, like light or dark spots on the screen, to in- 
dicate which elements have overflowed. It can 
be used to interactively adjust the algorithm. 

Among the other operations necessary for 
image processing are common arithmetic func- 
tions like addition, subtraction, and multi- 
plication. Generally, images consist only of pos- 
itive numbers representing the gray-scale 
value of the pixel. Image multiplication is need- 
ed for windowing or masking. A two-dimen- 
sional template representing a window may be 
shifted into the array and multiplied by the 
resident image. Any of these arithmetic oper- 
ations may cause an overflow, which will be in- 
dicated if an overflow bit plane is used in the re- 
sult field. 

Register shifting is taken care of in the same 


manner as moving a contiguous section of 
memory on a standard machine. To shift up- 
ward in memory index, the highest numbered 
element in the block is shifted first, followed by 
the second highest, the third, and so on. Once 
again, overflow detection is needed to deter- 
mine whether an element is shifted out of its 
field, since the program cannot write outside 
the field. 

Translation, another basic operation, is one 
of the simplest for the chip to handle because of 
the relationship between neighboring pro- 
cessors. To shift toward the east a 1-bit field 
located at RAM address 12 within the processor 
array, simply execute: 

EW: = RAM12 
EW:= W 

C:= EW 

RAM 12:=W 


Here, overflow detection is not needed, since 
there is no possibility of an overflow taking 
place. 


Back to basics 


One basic task of image processing, thresh- 
olding, unites a number of the foregoing oper- 


Program 1. Establishing the highest-intensity pixels 


COMMENT: initialize EW = 1 

WS: =0, EW:=0, C:=1 

NS: =0, EW:=C, C:=0 

COMMENT: Loop from MSB to LSB and detiver MAXVAL as bit serial output on GO 
forn = &to 1 do : 


{ 


NS: =AAMn, EW:=EW,C:<0 
NS:=NS5, EW:=EW,C=Cy 
NS:=C, EW: =EW, C:=0 


# GO=1 
{ 


EW: =EwW 


i 
it GO=0 


I 
| 


(Read next bit from RAM into NS) 
(Form NS “and” EW) 

(Send result to GO from NS) 

(Bit n of MAXVAL = 0 from NS) 
(EW retains present vaiue) 

{Bit n of MAXVAL = 1) 


(EW set to 0) 





i | 


A ge 


ae ee” ee ee 


ma 6m. 


& 
‘e) 
AW 


¢ 
i 


Program 2. Binary-tree summing 


form = 0 to 5 do 


{ 
¢:=0 
for n=n1 to {8+ mM) do 


NS:=RAMn, EW:= EW, C:=C 
NS:=NS, EW: =RAMn, C:=C 
for p=1 to 2° m 

NS:=S, EW: = EW, C:=C 


{ 
RAMn: = SM, C:=CY 


i 
RAM (M+9):=CY 





Program 3. Sorting pixels into bins 


NS:=0. EW:=0, C:=1 
NS:=0, EW:=0, C:=1, RAM:0=C (Initialize RAM 0 = 1} 
for n=1 to 6 do 















(Broadcast bin bit n) 


NS. =0, EW:=0, C:=X (Where X is the value of 
. bin bit n) 
NS: = RAMn, EW: =C, C:=1 (Read bit n of image 
pixel) 
NS: = RAM127, EW: =EW, C:=1, RAM127:~SM (SM = 1 if NS matches EW) 
NS:=NS, EW: =RAMO, C:=0 (Read RAM 0 and compare 
with RAM 127) 
NS: =NS, EW: =EW, C:=CY {CY=1 it RAM 0 and 
RAM 217 were both 1) 
NS=NS, EW=EW, C=1 (If all six bits match, 


then RAM 0 will continue 
to contain 1) 


DESIGN ENTRY 


Systolic image processor 


an interrupt to the controller, which locates the 
highest-intensity pixels by counting the num- 
ber of zeros that preceded them. 


Stand and be counted 


Counting the number of pixels that are 
displayed at maximum intensity is also done 
relatively simply and quickly with the array. 
Traditional processors would take O(N X N) 
operations, but an array-based binary tree ap- 
proach performs a number of additions in par- 
allel, hence requiring only O(log N) operations. 
Several pairs of numbers are added within all 
columns of an array, then pairs of these results 
are added in parallel. The resulting data flows 
upward through the block of arrays until the 
sum reaches the top processor element of each 
column. 

At that point, a second algorithm sums the 
values in the rows until the total] for the entire 
block is contained in the upper-left-hand pro- 
cessor element. Since translation operations 
cause data to shift into the edge of the array, 
these inputs must be set to zero so that the ex- 
ternal data contributes zerotothesum.A 
binary-tree summation of a column of 64 num- 
bers first assumes that the numbers are 8-bit 
pixel values. They are also assumed to reside in 
RAM locations 1 for the LSB to 8 for the MSB 
(Program 2). The partial sums are stored in 
RAM locations 1 through 14. 


Straightforward convolutions 


Convolution is one of the most important jobs 
performed in image processing. It uses the pre- 
viously described neighborhood algorithm to 
determine new values for pixels, thereby en- 
hancing an image. Convolutions are put to work 
along the entire range of image processing, 
from upgrading old photographs to improving 
the definition of edges in a robotic vision 
system. 

Convolution is characterized by a high level 
of parallelism, so it is well suited to the systolic 


array. Typically, a template of new values is 
placed over the values of the camera image. 
Global broadcasting distributes the template. 
The objective is to move the sum outward ina 
spiral from the center of the template, which is 
the location of the new pixel value, to each of 
the matrix elements that reside under the 
template. At each matrix a location multipli- 
cation is performed, and the result is added to a 
traveling sum. The image resulting from this 
convolution is enhanced. Since all of the sum- 
mations occur simultaneously, the parallel 
array processor handles the job at a good clip. 

Histograms, which count the number of pix- 
els containing particular gray-scale values, can 
make adjustments for changes in lighting, as 
well as let systems adjust to very light or very 
dark images. In that way they improve visual 
information at either end of the intensity 
spectrum. 

The process is handled as quickly as the 
array’s global-sum operation counts the pro- 
cessor elements. The elements to be counted are 
first identified by broadcasting a gray-scale 
value to every processor element and com- 
paring it with the pixel value stored in each. 
Matches to the image stored in RAM locations 1 
to 6 are determined by using a specific algo- 
rithm (Program 3). Various values are broad- 
cast to create series of “bins,” with different 
pixel levels sorted into the appropriate bins. 

After this task is finished, every processor 
element that holds a pixel matching the broad- 
cast pixel will have a logic 1 in RAM location 0. 
Before counting the number of pixels, a quick 
check for GO= 1 will indicate if there were any 
pixels at all which matched the broadcast value. 
By determining the number of pixels in the var- 
ious bins, the system can figure out whether the 
image is dark or light or contains a variety of 
shades, making adjustments as necessary.D 


Acknowledgment 

The authors wish to thank Martin Marietta Aerospace 
(Orlando, Fla.) for their contribution to the devel- 
opment of the GAPP architecture. 


9 | —— —s 
. qd 
4 1 4 


Ee 
| 
| 


ea 
_ 


! 
| 
: 


ai = ere band == Fret ae. | 


Nee 


be 


c 


tracting, and deciding (Fig. 1). Each may be 
processed by a dedicated hardware component, 
or two or more tasks can be processed by the 
same hardware. 

The improver block accepts raw data from 
the sensor, in many cases, a camera. Then it 
either restores the signal, correcting degrada- 
tions caused by the sensor, or enhances it, 
boosting the quality of the image to facilitate 
further processing. Signal restoration typically 
encompasses image deblurring, while enhance- 
ment includes edge enhancement or smoothing. 

The data is then passed along to the screening 
block, which removes the information not re- 
quired for the sophisticated algorithms that 
follow. Among this extraneous data is both 
noise and pixels below a given threshold. 

The extractor pulls out the primative charac- 
teristics that can be used by the decider to 
recognize the object. And the final block, the de- 
cider, looks at a!] the characteristics and attri- 
butes gathered concerning the object and its 
surroundings, compares them with its under- 
standing of the object's features, and decides 
whether it recognizes the object. 

The systolic array chip can be put to work in 
any of these blocks and is easily configured to 
process all of these sequential steps at the real- 
time rate of most video cameras (10 MHz). At 





present, they are handled by the hardware ar- 
chitecture deemed most appropriate for the 
task at hand. Both improving and screening are 
high-speed computations that involve a small 
number of dedicated algorithms, so they are 
generally handled by pipelined processors. Ex- 
tracting and deciding are usually taken care of 
by parallel microcomputers because of the wide 
variety of algorithms and different types of 
computations they involve. 


Process in parallel 


Another scheme, massively parallel! pro- 
cessing, dedicates one processor to every pixel 
in an image, thus ensuring very high speeds. 
The systolic array goes with this approach; con- 
sequently, designers have that particular ar- 
chitecture available in relatively small 
systems. 

In practice, arrays can be grouped together 
until the number of processor elements match- 
es the number of pixels being processed. Fur- 
ther, each of the chip’s 72 processor elements 
has 128 bits of dedicated RAM, which gives it an 
additional level of flexibility. This internal 
memory also increases throughput by elimi- 
nating the time-consuming data fetches re- 
quired with typical von Neumann processors. 

The array’s single instruction, multiple-data 


Deciding 


1. After data is picked up by a sensor, the typica! pattern recogni- 
tion system processes it in four sequential taesks—improving, 
screening, extracting, and deciding. Any of the four can be han- 
died by the Geometric Arithmetic Parallel Processor, set up as a 
dedicated hardware component. Alternatively, two or more tasks 
can be shared by the same hardware. 





DESIGN ENTRY 





Systolic array chip 
recognizes visual patterns 
quicker than a wink 





Simultaneously processing a host of pixel values, 
a monolithic systolic array gives pattern recognition 
systems the get up and go to work in real time. 





The third article in a series dedicated to the first com- 
mercial systolic array processor focuses in on pattern 
recognition. The first (Oct. 31) introduced the chip and 
the second (Nov. 15) investigated its use in imaye ma- 
nipulation. Forthcoming discussions will explore using 
the chip as an associative memory unit and an asso- 
ciative processor, as well as data-base management. 


that lets automatic inspection and im- 
age-processing systems emulate the cog- 
nitive talents human beings take for granted. 
Working with it, a system looks at an object, 
determines what it is—often improving the 
image sent toit in the process—and establishes 
if the object under scrutiny meets specific 

criteria. 
A multitude of algorithmic approaches and 


Pir recognition is a simple concept 





Winthrop W. Smith Jr., Whitman Engineering Inc. 
Paul Sullivan, NCR Microelectronics 

Winthrop W. Smith Jr. is vice president of research and 
engineering at Whitman Engineering in Maitland, Fla. 
He previously worked at Martin Marietta Aerospace, 
where he spearheaded the development of the Geometric 
Arithmetic Parallel Processor architecture. Smith holds 
a BS and PhD in electrical engineering from Johns Hop- 
kins University and a masters in electrical engineering 
from Case Western Reserve University. 

Paul Sullivan is the business unit director for the digital 
signal- processing devices group at NCR's Microelec- 
tronics Division in Fort Collins, Colo. He holds a PhD 
and a MSEE from the University of Southern California 
anda BSEE from Colorado State University. Before 
NCR, he worked for Hughes Research Laboratories. 


Reprinted from ELECTRONIC DESIGN - November 29, 1984 


hardware architectures have been tried in the 
attempt to let machines quickly convert data 
from sensors into information that can be used 
to decide on the next action. The Geometric 
Arithmetic parallel processor (GAPP),a CMOS 
chip that carries 72 single-bit microprocessors 
that run in parallel, is suited to many of these 
schemes. 

Its multiple data paths make it right at home 
in a range of settings, allowing designers to 
turn to a single device instead of to a number of 
dedicated pieces of hardware. Further, specific 
algorithms can be assigned to various tasks, 
since the software for the array has more in 
common than do programs designed for unre- 
lated hardware. 

Despite the assortment of algorithms de- 
voted to pattern recognition, the approach can 
generally be broken down into two categories — 
template matching and feature matching. The 
first scheme is the most straightforward. Using 
it, a system simply compares an incoming im- 
age to those stored in memory unti] the match- 
ing pattern, or template, is found. 

Feature matching is more sophisticated and 
thus demands more processing power. In it, the 
system views the length, width, and other char- 
acteristics of an object to determine what it is 
without comparing it to a template. 

Either technique can be readily handled by 
the systolic array. Both comprise four basic 
tasks, or blocks—improving, screening, ex- 


4 


“1 


ome | et 


So 


| "| | tears beneinc eS | Gots 


Nell 


be 


tracting, and deciding (Fig. 1). Each may be 
processed by a dedicated hardware component, 
or two or more tasks can be processed by the 
same hardware. 

The improver block aecepts raw data from 
the sensor, in many cases, a camera. Then it 
either restores the signal, correcting degrada- 
tions caused by the sensor, or enhances it, 
boosting the quality of the image to facilitate 
further processing. Signal restoration typically 
encompasses image deblurring, while enhance- 
ment includes edge enhancement or smoothing. 

The data is then passed along to the screening 
block, which removes the information not re- 
quired for the sophisticated algorithms that 
follow. Among this extraneous data is both 
noise and pixels below a given threshold. 

The extractor pulls out the primative charac- 
teristics that can be used by the decider to 
recognize the object. And the final block, the de- 
cider, looks at al] the characteristics and attri- 
butes gathered concerning the object and its 
surroundings, compares them with its under- 
standing of the object’s features, and decides 
whether it recognizes the object. 

The systolic array chip can be put to work in 
any of these blocks and is easily configured to 
process all of these sequential steps at the real- 
time rate of most video cameras (10 MHz). At 


Improving 





present, they are handled by the hardware ar- 
chitecture deemed most appropriate for the 
task at hand. Both improving and screening are 
high-speed computations that involve a small 
number of dedicated algorithms, so they are 
generally handled by pipelined processors. Ex- 
tracting and deciding are usually taken care of 
by parallel microcomputers because of the wide 
variety of algorithms and different types of 
computations they involve. 


Process in parallel 


Another scheme, massively parallel pro- 
cessing, dedicates one processor to every pixel 
in an image, thus ensuring very high speeds. 
The systolic array goes with this approach; con- 
sequently, designers have that particular ar- 
chitecture available in relatively small 
systems. 

In practice, arrays can be grouped together 
until the number of processor elements match- 
es the number of pixels being processed. Fur- 
ther, each of the chip’s 72 processor elements 
has 128 bits of dedicated RAM, which gives it an 
additional level of flexibility. This internal 
memory also increases throughput by elimi- 
nating the time-consuming data fetches re- 
quired with typical von Neumann processors. 

The array’s single instruction, multiple-data 


Extracting Deciding 


1. After data is picked up by a sensor, the typica! pattern recogni- 
tion system processes it in four sequential tasks—improving, 
screening, extracting, and deciding. Any of the four can be han- 
died by the Geometric Arithmetic Parallel Processor, set up as a 
dedicated hardware component. Alternatively, two or more tasks 
can be shared by the same hardware. 


DESIGN ENTRY 
Systolic array chip 


path ts particularly well suited to the afore- 
mentioned task of improving, since the al- 
gorithms used for it demand pixel processing 
using only local comparisons to determine the 
value of a pixel. This so-called neighborhood 
processing takes advantage of the structure of 
the chip, in which each processor element com- 
municates with its nearest neighbors on the 
north, south, east and west. 


Back to basics 


One of the most fundamental algorithms 
used in restoration consists mainly of adding 
successive frames of the same image (on a pixel- 
by-pixel basis) to yield a running average. 
Doing so improves the signal-to-noise ratio of 
the image, making it easier for the system to 
process and making it more visually pleasing to 
the operator overseeing the task on a display. 

The actual code used to add two 8-bit images 
(see Program 1) assumes that the first is stored 
in RAM locations 0 to 7 (with the MSB at the 
highest location) and that the second is stored 
in RAM locations 8 to 15 (the MSB here is held 
in location 15). It also takes for granted that 
both words are positive. Simple extensions, 
though, allow negative numbers to be added or 
subtracted. 

When the 25 instructions are finished, the 
two input images remain in the same RAM lo- 
cations, while the sum of the images is stored in 
locations 16 to 24. The number of instructions 
needed to add an m-bit number to an n-bit num- 
ber, where n=m, can easily be determined with 
the equation 8m+2(n—m)+1. When both 
numbers are 8, as in the above example, the re- 
sult is 25. 

Another common algorithm, this one used in 
image enhancement is a finite-impulse-re- 
sponse filter. The equation: 

N-1 
Y(n)= 2 a(i)I(n—i) 
i= 0 
represents the output, Y(n) in terms of the in- 
put, I(n). It consists of both adds and shifts. 

The objects being observed are generally well 
defined in contrast to the background, making 
it easy for the system to pull out the character- 
istics needed to recognize a pattern. When the 
data is received by the screening block, how- 
ever, one of its key tasks is to replace the weak 


signals along the edges of the object with 
stronger signals. 

A common technique used for edge enhance- 
ment calls for a Sobel filter, a two-dimensional 
finite-impulse-response filter with a threshold- 
ing algorithm. The Sobel filter takes an existing 
image and creates a new one comprising the 
magnitude and direction of all the strong edges 
of the object. 

The filter works with neighborhood pro- 
cessing, determining the value of a pixel by ex- 
amining those adjacent to it. With a 3-by-3- 
pixel grouping, consisting of pixels A through I, 


Program 1. Double vision: 
Adding two 8-bit images 


NS: =RAM 0; C:=0 
EW:=RAM 8 

RAM 16:=SM; C:=CyY 
NS:=RAM 1 
EW: = RAM 9 

RAM 17:=SM; C:=CY 
NS:=RAM 2 

EW: =RAM 10 

RAM 18:=SM; C:=CY 
10 NS:~RAM 3 
EW:=RAM 14 

RAM 19:=SM; C:=CY 
NS: = RAM 4 


EW: =RAM 12 

RAM 20:=SM; C:=CY 
NS:= RAM 5 
EW:=RAM 13 

RAM 21:=8M; C=CY 
NS:= RAM 6 
EW: = RAM 14 

RAM 22:=SM;, C=CY 
NS: = RAM 7 

EW: =RAM 15 

RAM 23:~SM: C=CY 
RAM 24:=C 


COOn RO ADA = 





(A+B) (B+C) - 

(+E) (E+F) - 

(G+) (H+) - 
(b) 


{(A+B)+(B+6)} - 


(O+E}+(E+F) - 
- (G+H)+(K+) - 
(c} 


2. Edge enhancement using a Sobel filter furnishes 
a new vatue for pixel E by comparing it with those 
(A through D and F through 1) that surround it (a). 
Pixel values are added together in parallel (b,c) to 
yield the value (d). All values are added in parallel. 





Ww 
if 


te 


i 
| ‘ = 3 






Program 2. Sobel-fiiter to 
establish pixel value 


Line Code Line Code 





































RAM 34:=C 


1 EW: = RAM 0: C:=0 54 NS: =AAM 25 
2 EW:=€: NS:=RAM 8 55 NS:=5§;C:=0 
3 RAM 16:=SM: C:=CY 56 NS: =RAM 25; EW:=NS 
4 EW: =RAM 1 57 NS:=N 
5 EW:=E; NS:=RAM $ 58 RAM 35:=SM: C:=BW 
6 .RAM 17:=SM, C:=CY 59 NS:=RAM 26 
7 EW: = RAM 2 60 NS$:=S 
] EW:=E, NS: = RAM 10 61 NS: = RAM 26, EW:=NS 
9 RAM 18:=SM; C:=CY 62 NS:=N 
10 EW: = RAM 3 63 RAM 36:=SM; C:=aw 
11 EW:=E: NS:=RAM 11 64 NS: = RAM 27 
12 RAM 19:=SM; C:=CY 65 NS:=S 
13 EW:=RAM 4 66 NS: = RAM 27; EW:=NS 
14 EW:=€, NS:=RAM 12 67 NS:=N 
15 RAM 20:=SM; C:=CY 68 RAM 37:=SM: C:=BW 
16 EW: RAM 5 69 NS:=RAM 28 
17) «EW:=E; NS:=RAM 13 70 WNS:=S 
18 RAM 21:=SM; C:=CY 71 NS: =RAM 28: EW:=NS 
19 EW: =RAM 6 72 NS:=N 
20 EW:=E:; NS:=RAM 14 73 RAM 38:=SM,; C:=8BW 
21 RAM 22:=SM; C:=CY 74 NS: =RAM 29 
22 EW:=RAM7 75 NS:=S 
23. ~EW:=E; NS:= RAM 15 76 NS:=RAM 29; EW:=NS 
24 RAM 23:=SM: C:=CY 77 NS:=N 
25 RAM 24:=C 78 RAM 39:=SM; C:=BW 
26 EW: = RAM 16; C:=0 79 NS: =RAM 30 
27) =EW:=W: NS:=RAM 16 60 NS:=S 
28 RAM 25:=SM; C:=CY Bt NS: = RAM 30; EW:=NS 
29 EW: =RAM 17 82 NS:=N 
30 =EW:=W: NS:=RAM 17 83 RAM 40:-=SM: C:=BW 
31 RAM 26:=SM, C:=CY 64 NS:=RAM 31 
32 EW: = RAM 18 85 NS:=S 
33 EW: =W: NS:=AAM 18 86 NS: = RAM 31; EW:=NS 
34 RAM 27:=SM;C:=CY 87 WNS:=N 
35 EW: =RAM 19 88 RAM 41:=SM; C:=BW 
36 EW:=W: NS:=RAM 19 89 NS: = RAM 32 
37 RAM 28:=SM, C:=CY 90 6hNS:=S 
38 EW:=RAM 20 91 NS:=RAM 32: EW:=NS 
39 =~EW:=W: NS: =RAM 20 92 NS:=N 
40 RAM 29:=SM. C:=CY 93 RAM 42:=SM; C:=BW 
41 EW: =RAM 21 94 NS: =RAM 33 
42 EW: =W; NS: = RAM 21 95 NS:=S 
43 RAM 30:=SM, C:=CY 96 NS:=RAM 33; EW:=NS 
44 EW: =RAM 22 97 NS: N 
45 EW: =W: NS:=RAM 22 98 RAM 43:=SM; C:=BW 
46 RAM 31:=SM; C:=CY 99 WNS:=RAM 34 
47 EW: = RAM 23 100 NS:=S 
48 EW: =W: NS: =RAM 23 101 NS:=RAM 34; EW:=NS 
49 RAM 32:=SM; C:=CY 102 NS:=N 
56 EW: = RAM 24 103 RAM 44:=SM: C:=BW 
51 EW: = W: NS: = RAM 24 104 WNS:=0:EW:=0 
52 RAM 33:=SM; C:=CY 105 RAM 45:=SM 






(Fig. 2) the equation that determines the Y axis 
values is 


Y = (A+2B+C)-—(G+2H+]) 
= ((A+B)+(B+C)] — [((G+H)+(H+D)] 


The X axis values are established by 
X = (C+2F +1) —-(A+2D+G) 
= ((C+F)+(F+I)] — ((A+D) + (D+G)] 


The code for processing the first equation 
(see Program 2) reveals the unusual aspects in- 
volved in programming the systolic array. The 
true key to the chip’s speed lies in the simulta- 
neous computations it carries out. The first 25 
instructions add each pixel to its neighbor to 
the west to form a new image. 

Likewise, instructions 26 through 53 add this 
new image to the eastern neighbors, forming a 
second image. Instructions 54 to 105 add each 
pixel in that new image to its neighbor to the 
north and finally add each pixel in this third 
image to its sourthern neighbor. The resulting 
data, which is determined for every pixel in the 
image, can have as many as 10 bits, as well asa 
+ or — sign. The latter denotes whether the 
edge gradient is changing from black to white 
or white to black. The fact that each of these 
values is computed in parallel results in fast 
throughput, since the value for the middle row 
is shifted upward, where it becomes the bottom 
row of the 3-by-3-pixel grid being processed in 
an adjacent processor element. The valuealsois 
shifted downward to the adjacent processor 
element, where it becomes the top row for the 
grid being processed there—a 3-by-3-pixel 
group centered around H. 

The following Sobel operations are per- 
formed within individual cells of the array. 
First, a reasonable approximation of the mag- 
nitude of the gradient vector is computed by 
adding the absolute values of X and Y. To obtain 
that result, the sign bit plane for X determines 
whether to invert it and add one toit toform the 
absolute value. Processing the absolute value of 
a number takes 3m+8 instructions, where m 
equals the number of bits. 

Next, the direction of the gradient to the 
nearest 45° line is determined. This must be 
done so that data can be processed by the ex- 
tractor block and is accomplished by using the 
signs of both X and Y to determine which quad- 


DESIGN ENTRY 


Systolic array chip 


rant contains the gradient. Once the direction is 
established, another process, consisting of 
18m + 76 instructions, is performed to bring the 
vector in line with the nearest 45° angle. 

In thresholding, the final step, the value is 
compared with a predetermined constant. The 
result of this operation is used to pass or reject 
the direction vector, a step necessary to ascer- 
tain that it is valid data and not simply noise. If 
the vector information is below the threshold, 
the resulting word consists of 8 zeros. If it is 
valid, the location of a single 1 in the 8-bit word 
will denote which of eight directions the vector 
lies closest to. Tresholding requires m+13 in- 
structions, with another 18 instructions needed 
to properly place the ] in valid words. 


The tast word 


The resultant word is then passed to the ex- 
tractor. At this point, aspoke filter is called into 
play. It is used to detect objects of various 
shapes and to extract length, width, and other 
data from the image. For example, the filter can 
be used to determine how many of the radial 
spokes along eight axes have Sobel! gradients 
that point to or away from a specific pixel. The 
expected size and shape of the object deter- 
mines the lengths of the spoke’s arms, which 
can range from a single pixel to very large pixel 
blocks. 

The final block in a pattern recognition sys- 
tem handles the task of deciding. This is typi- 
cally carried out by a group of microprocessors 
or bit-slice processors that receive, store, and 
then manipulate the object characteristics that 
have been extracted in the preceding series of 
operations. These manipulations generally in- 
volve projecting a feature into a previously 
determined and segmented space to determine 
what type of object makes up the image. Al- 
though systolic arrays can perform this chore, 
it will probably remain the realm of standard 
processors. Algorithms are in plentiful supply 
that make good use of standard von Neumann 
architectures, which are well suited to the task. 

However, one of the most challenging aspects 
of implementing the decider hardware is link- 
ing it with the high-speed front end hardware. 
Often, this means adding several microcompu- 
ters, hence demonstrating the inherent net- 
working inefficiencies of these architectures. 


The best configuration for transmitting data 
from the systolic array blocks is to use a so- 
called corner-turning block, a small group of 
systolic arrays that reformats data for the 
microprocessor. The systolic array normally 
sends out data in bit planes instead of the 8- or 
16-bit words microprocessors work with so 
readily. The corner-turning block formats the 
bit planes into the gray scale values that make 
up a pixel. At the same time, the arrays can 
postprocess the pixels before passing the final 
decider data on to the microcomputer network. 


Flexibility is the key 


Each of the four tasks—improving, screen- 
ing, extracting, and deciding—can be handled 
easily by systems built around the systolic 
array chip. The IC can be incorporated into a 
number of setups, ranging from those that rec- 
ognize moving objects, through those that work 
with an assembly line robot, to those that rec- 
ognize characters. Indeed, the chip’s flexibility 
can be quickly grasped by these examples. Al- 
though all use roughly the same overall system 
architecture, the array can be programmed to 
meet the various algorithmic and speed de- 
mands called for by particular tasks. Thus a 
generic system can work in a variety of applica- 
tions, giving it the flexibility associated with 
traditional microprocessors (Fig. 8). 

The system itself is centered on the main 
block of chips, but it also employs the array to 
take care of reformatting. These blocks will 
typically be used to format data for input as 
well as for output. 


On automatic pilot 


One example of this generic approach is an 
automatic inspection system that scans parts 
as they move along a production line. The 
system's data rate runs from 8 to 16 million 
words/s, with real-time response required. Im- 
proving and screening would require 1 or 2 algo- 
rithms; the extractor would execute from 10 to 
20 algorithms. The decider would perform 1 to3 
algorithms using the resultant data. 

The major design tradeoff that must be con- 
sidered when building this or any other generic 
system is deciding how many processor ele- 
ments are needed and how they should be ar- 
ranged. That, in turn, hinges on the throughput 


ss _ 


i 


toca 


i 


—— 


=z | 


{ 


= ee ad 
| J a 
| | ———— | 


een Lay ==. as 
Me G SoG 


Se aes 


beta] 


ny 


i 
4 


rate and the number of instructions that must 
be performed. 

For automatic inspection, as many as 3300 in- 
structions might be performed on each frame of 
data. Using a 10-MHz chip, these can be pro- 
cessed In 330 us. Since a standard 30-Hz frame 
is sent only every 33 ms, the chip would be oper- 
ating only 1% of the time if an individual pro- 
cessor element were dedicated to each of the 512 
by 512 pixels. It is thus possible to trim costs by 
building a smaller block of systolic array chips 
and passing the data through the block several 
times. 


On the tube 


To process 3300 instructions, the block would 
require roughly 2623 processor elements, or 
about 37 chips. Since data comes from the 
sensor inthe standard TV-line format, the most 
natural way to arrange the chips, which are 
themselves laid out as 6 by 12 processor ele- 
ments, is to work with a grid of 6 by 516 ele- 
ments. Thus the system could handle six incom- 
ing rows of data at once because the array’s 





communication registers, which move data in 
and out, could easily handle the 512 usefu! 
samples of data coming from a1l0-MHzTV 
camera. 

However, if the spokes employed in feature 
extraction are 10 pixels long, the span of the 
spoke wheel is 20 pixels vertically. With only six 
cells aligned vertically, that causes severe 
complications in implementing the spoke algo- 
rithm. 


Three solutions 


There are three ways out of this problem. The 
simplest is to add two or three more rows of 
chips so the block is 18 to 24 elements deep. 
There will be some overlap problems, but the 
added processing power makes them fairly easy 
to solve. 

Alternatively, two frame buffers could be 
added before the input reformatting block. 
That lets the system send rectangular arrays of 
data into the main block of chips. Overlap prob- 
lems would be less severe, but this solution 
drives cost up since more hardware is used. The 


Display 
interface 


3. A generic system based on the systolic array chip can be used in many applications. Systolic 
arrays are not only the main system building block, but are employed to reformat data going into 
and coming out of the main array. The last performs the processing needed to determine the 


characteristics of the object under scrutiny. 


DESIGN ENTRY 


Systolic array chip 


tradeoff between the two solutions comes down 
to the cost and time needed to design in the 
requisite buffers, on the one hand, and the 
added software needed to process the spoke- 
filter algorithm when only six rows of cells 
are used, on the other. 


The third time’s the charm 


The ultimate solution goes with the 6- 
by-516-cel] layout, but stores three or four sets 
of six video lines in the cells using each ele- 
ment’s 128 bits of RAM. Four sets of six rows 
would occupy only 82 RAM locations when 8-bit 
words are used. If this scheme leaves enough 
RAM to perform the necessary computations, it 
is an attractive solution, since 24 video lines can 
be processed simultaneously with a single row 
of chips. 

Other systems can be configured using the 
same basic approach. Varying data input rates 
and the type of data being processed, though, 
will force minor changes in the actual arrange- 
ment of the blocks. One variation on the basic 
system enables it to be put to work enhancing 
images in computer-aided tomography. Im- 
proving the quality of the images from a CAT 
scanner has the obvious advantage of giving 
the physician a better chance to locate and 
identify abnormalities. 

Since input from the scanner comes in large 
blocks of data—every few minutes or seconds— 
the system also requires an input buffer. It 
must have a high-speed input port, although it 
can send data to the block of arrays at a slower 
rate. 

The variety of algorithms needed for the 
many chores of a physician mandate a fairly 
large memory buffer for the controller. The 
main block of processor elements should be rec- 
tangular; the total number of cells will again 
depend on the algorithmic load and the fre- 
quency of input. It might also be a good idea for 
the system to have an interactive interface so 
that the doctor can refine the image while it is 
being viewed.0 


> 














DESIGN ENTRY 





Associative memory 
calls on the talents 
of systolic array chip 





A monolithic systolic array puts its on-chip 
memory to good use, first searching out the desired 
information and then processing it. 





This is the fourth in a series of articles dedicated to the 
first commercial systolic array processor chip. The 
series began with the cover article in the Oct. 31 tssue 
(p. 207) and has continued in every consecutive issue. 


ow to retrieve data from memory isa 
classic problem for designers. Theoreti- 
cally, one of the simplest ways to get in- 
formation is to match the memory contents 
with the desired key, much as an instructor 
calls upon pupils to volunteer an answer toa 
question. However, processors traditionally 
force programmers to call memory using ad- 
dresses. That scheme often makes for relatively 
slow processing. Searching through a set of 
numbers for the one with the highest value, for 
instance, typically forces a von Neumann pro- 
cessor to examine each member in the set at 
least once. 

An alternative is an associative memory 
system, which matches some part of the desired 
data within memory instead of requiring ad- 
dresses. Known by a host of names, including 
content-addressable memory, data-addressed 
memory, parallel search memory, and search 





Hi 


ee TT 


Lyle Wallis, NCR Microelectronics 

Lyle Wallis is an applications engineer at NCR's Mi- 
croelectronics Division in Fort Collins, Colo, He holds 
a BSEE from the University of Missouri at Columbia 
and an MSEE from the University of California at 
Berkeley. 


Reprinted from ELECTRONIC DESIGN - December 13, 1984 


associative memory, the technique has been 
called into service a number of times in various 
applications. But such systems are usually fair- 
ly large and expensive. 

The Geometric Arithmetic Parallel Pro- 
cessor, or GAPP, takes a new approach to this 
long-standing dilemma. The first systolic array 
processor chip, the device is not only well suited 
to associative memory but can be configured in 
a comparably small system as well. Further, 
since it performs logical and arithmetic oper- 
ations, it also serves as an associative processor 
that works on the data found in a memory 
search. 

The chip carries 72 single-bit processors that 
run in parallel asa single-instruction, multiple- 
data path system. Each processor element is 
fitted out with 128 bits of dedicated RAM. Also 
vital to associative memory is the IC’s global 
broadcast function, which lets users transmit 
data, such as the search word, to all of the pro- 
cessor elements simultaneously. 

The tremendous speed advantage of parallel 
processing over conventional methods is dem- 
onstrated in the relative time needed to per- 
form associative memory searches. Searching 
for a single number in a set with N members, a 
traditional processor running an algorithm 
that lets it look just once at each number takes 
up to N cycles to locate the desired data. The as- 
sociative processor, in contrast, needs M cycles, 
where M equals the number of bitsin the target 


DESIGN ENTRY 


Systolic array chip 





information. The device interrogates the 
memory entries in parallel, and once the proper 
information has been found, it can be either 
processed by the associated processor or passed 
directly to the host. 

An associative memory unit can be broken 
down into two main components: an associative 
array and an associative array controller (Fig. 
1). The arrangement is similar to traditional 
memories, which consist of an array of address- 
able cells and acontroller. 


Other functions 


In addition to managing the memory, the 
controller also handles sequencing. It contains 
two registers: one holds the data that the host is 
looking for, called the “comparand”; the other 
contains a mask, which screens out unwanted 
bits during processing. 

The associative array comprises a group of 
cells, each of which generally holds a single 


Associative array 
controller 


Target-word, or comparand, register 


Associative array 





1. An associative memory consists of a controller 
and an array of memory cells. The first section man- 
ages the second and stores the target word—or 
comparand—and the mask, both of which come 
from the host. The tag bits attached to each cell de- 
note what data is located at any one. 


word. Each cell has a tag bit to notify the con- 
troller when its data matches the sought-after 
word. When there is a match, the tag bit is set 
and that cell is termed a responder. If the de- 
sired data is not in that cell, the tag bit remains 
unchanged and the cell is dubbed a nonre- 
sponder. 

Every cell within an associative array also 
performs three tasks—compare, write, and 
read. The first and most fundamental task 
simply compares the masked target data with 
the contents of the cells, setting the tag bit if 
there is a match. The second then writes a data 
word to the responding cells. Read, the final 
function, shifts the contents of the responding 
cells to the output bus. If there is more than one 
responder, the output is the bit-by-bit logical 
OR of all the responding cells. These three func- 
tions are carried outonly on datain the respond- 
ing cells; nonresponding cells are untouched. 

This architecture can easily implement the 
associative memory and handle associative 
processing tasks. The feature set of the systolic 
array chip performs all the operations needed 
to form an associative array, and when coupled 
with a programmable control circuit, it also 
serves as an associative processor. 


Maintaining control 


The GAPP chips themselves form the asso- 
ciative array. However, since these devices do 
not have any control features, an external con- 
troller must be used to sequence the array and 
to oversee the target-word and the mask reg- 
ister. It consists of a control store, address se- 
quencer, and host computer (Fig. 2). 

The associative array can be built in various 
sizes, since the chips can be ganged together to 
create larger arrays. Because each processor 
element within a grid of chips is a serial pro- 
cessor, word widths greater than one bit must 
be emulated serially. The on-chip RAM lets the 
elements handle a variety of word widths, as 
well as multiple words and multiple tag bits 
since every RAM location is individually ad- 
dressable. Further, there will be some RAM left 
over that can be used as a scratchpad or for 
storing arithmetic operands and other data. 

Input and output for the full array are 
handled via the communications (CM) bus. In- 
put usually comes from traditional] processors 


=2T 
' 


pose 7 


in word-serial and bit-parallel form. However, 
because processor elements accept only word- 
parallel and bit-serial data, incoming informa- 
tion must be reformatted. This job, often 
referred to as corner turning, can be imple- 
mented with GAPP devices or with special 


The control section can be set up ina number 
of ways, but the most effective is to use a pro- 
grammable controller. Whatever route is 
taken, though, the controller remains respon- 
sible for generating instructions and addresses 
for the array and for sampling the output of the 


responder detection circuit. The control store 
receives its list of instructions from the host. 
Bit seria! data can quickly be sent to all pro- 
cessor elements using the global input, which is 
easily accomplished by employing the op code 
lines to command the C register to load either a 
loraQ. 

Programming the associative processor 1s 
different from programming a conventional 
processor with RAM. That is due both to the dif- 
ferent type of search involved and to the array’s 
architecture. In operation, the instructions 
supplied by the contro] unit to the associative 
array are mnemonics for the chip. The control- 


parallel-to-serial circuitry. 
Doubling up 


The north/south (NS) register of the chip 
pulls double duty, acting both as a place to store 
the tag bit and as an area to carry out various 
| functions. The tags of all the processor ele- 


ments can be quickly sent to the controller us- 

ing the global output signal (see first article in 

series, ELECTRONIC DESIGN, Oct. 31, p. 207, for 
i definition of global output). Several GAPP 
chips can then be combined in a wired-OR con- 

4 figuration to generate a responder signal for 
| the control unit. 


Responder 
+ (global output) 


; Grid of 
48 X 48 
associative 
processor cells 
(32 GAPP chips) 


Mask register 


Address 


pargetexo sequencer 


register 


| 4k X 24-bit 


control 
store 


Ncees 
: CMN 
ne Line buffer 
array of 
412 X 48 
processor elements 
cms (8 GAPP chips) 


Bus to host —————» 





2. An associative memory controller (highlighted) consists of a control 
store, which holds commands coming from the host, and a sequencer. 
The sequencer can use data from responders to branch to another part 
of a program. 


433 


ease of using primitives can be seen in the exact 

match function, which differs from the com- 

pare primitive only in that the latter emulates a 

aoe machine by operating on a word bit 
y bit. 

As in the foregoing example, the responders 
are searched for an exact match to the masked 
target-word register, with matches indicated 
by setting the tag bit. Specifically, a response 
bit is maintained at location Rinthe RAM 
(Program 2). If Ris set, that particular cell con- 
tains a possible match to the broadcast data. If 
R is clear, that cell does not contain a possible 
match. At the end of the algorithm, all cells 
marked as possible matches are determined to 
be exact matches for the full word. This ap- 
proach takes 5.5M+2 cycles, with M again rep- 
resenting the number of bits being compared. 
(This rate assumes that the number of 1s and 0s 
is equal.) 

The write primitive is also handled witha 
bit-wide scheme. It, too, can be repeated for 
larger words. In this algorithm, the tag bit is 
restored to the NS register near the end of the 


Program 3. A writing lesson 


/* Write function */ 
/*Load contents of addr into ew*/ 
ew: =ram(addr)}, c:=0: 
/*produce AND of not(tag),assumed to be in NS, with 
contents of addr*/ 
c:=bw; 
ram(temp): =c, c:= 1: 
/*load value into ew, tag assumed to be in NS*/ 
if(vaiue = =Q) 
ew:=0, c:=0: 
else 
ew:=c, c:=0; 
/* AND value and tag*/ 
c:=cy: 
/*Load intermediate values in anticipation of OR*/ 
ns:=c, ew:=ram(temp}, c:= 1; 
/“perform OR and restore tag*/ 
c:™ cy, ns: =ram(tag); 
ram(addr}: = c; 
















program to ensure that multiple invocations 
will proceed smoothly. The restoration is com- 
bined with an OR operation, so no added cycles 
are demanded. 


First, load the cells 


The write algorithm begins by placing a Bool- 
ean value into each of the responder cells (Pro- 
gram 3). Since the chip treats all locations 
equally, some care must be taken to write only 
to the responders. To be sure that the non- 
responding cells are not modified, the tag bit for 
each responder must first be inverted and then 
ANDed with the data in RAM. 

Once this is done, the tag of the origina] re- 
sponders is ANDed with the new value. The re- 
sults are then ORed together; this effectively 
blocks the nonresponders from accepting the 
new value. The data is then written into the 
addr of the responders. 

As mentioned, these primitives can be used 
as building blocks for more complex tasks. A 
limit search, for example, finds the set of re- 
sponders with a value greater than, greater 
than or equal to, less than, or less than or equal 
to a particular number. All of this can be done 
with just the compare and write algorithms. 
Basically, all the responders are designated as 
greater than, equal to, or less than the desired 
value. Once that is done, it is simple to select 
any combination of sets. 


Call and response 


The three response bits, X, Y, and Z, all are 
held in RAM (Program 4). The first either indi- 
cates equality or denotes an undecided state 
that requires further processing. The second 
designates greater than, and Z signifies less 
than. Initially, all responding cells are set to the 
undecided state, or 100. If the MSB of the target 
number is a 1, then all the undecided elements 
have a0 as their MSBs will be marked less than. 
If the target’s MSB is a 0, all the elements that 
show al as their MSBs will be marked greater 
than. This sequence continues through the full 
number of bits making up the target, and at the 
end of the word, those cells in which all bits are 
undecided are determined to be equal to the de- 
sired number. The limit search takes 24+29M 
cycles. 

Once a responder or class of responders is 


DESIGN ENTRY 
Systolic array chip 


ler language in the algorithms used by the chip 
employs a syntax similar to that of the C lan- 
guage. References to the bits in the mask and 
target-word registers take the form of a bit 
number. The register’s MSBis assigned a0, and 
the LSB is designated M—1, where M equals the 
word length. The RAM associated with each 
processor element is similarly labeled, with the 


Program 1. Making a comparison 


/* Compare function */ 
/* Load the NS and EW Registers’/ 




















if (value = = 0) 

ew:=0, ns=ram (addr), c:= 1; 
else | 

c:=t: 


ew:=Cc, ns:=ram (addr): 
{ 
/*EXNOR into NS reg*/ 


ns:=ram (temp),ram(temp): =sm; 
/* AND result with TAG’ / 

ew: =ram(tag),c:=0, ns: =ns; 
c:=cy; 

/*place results in RAM and NS*/ 
ram(tag):=c, ns:=c; 


Program 2. The match game 


/* TAG exists in NS */ 

/*load tag into R*/ 
c:=0, ew: = 0; 

ramiR)}:=sm; 

/*Loop for every bit in the word*/ 

tor{i=Q; i < M, i+ +)} 

if(mask(i) = = 1)then} 


/*Load ew and ns with the values to be compared*/ 











if (comparand(i) = = O0)then 

ew: = 0, ns:=ram(cell+ i}; c:=90; 
etse; 

c=1; 


ew:=c, ns:=ramicell +i), c:=0; 


| 

/*EXOR operands*/ 
ns: =ram(temp), ram(temp): = sm, 
/*Update R bit and tag*/ 

ew: =ram(R): c:=0; 

c= cy; 
ram(R)}:=c, ns:=Cc; 
t 

{ 








contents of the data word appearing in loca- 
tions “Cell” through “Cell +M—1”. Response 
bits are stored in additional RAM locations and 
are typically used by the algorithms. 


Time for our program 


Since, as noted, a system consisting of GAPP 
chips is bit-serial, the most effective primitives 
are 1 bit wide. The bit-wide comparison prim- 
itive (Program 1) loads the comparand bit into 
all of the processor elements using one instruc- 
tion, an important technique for parallel data 
input. This is accomplished by using the set 0 in- 
struction of the East/West (EW) registers and 
the set 1 instruction of the C register. 

The compare algorithm also makes use of a 
time-saving scheme concerning the tag bit. 
Since the NS register, which normally retains 
the tag bit, is needed for many other tasks, the 
bit is simultaneously written toa RAM location 
reserved for the tag (last line of Program 1). At 
the completion of any function that reports re- 
sults by marking the responders using the tag, 
the tag bit also appears in the NS register. 

In each of the responding cells, the compare 
algorithm sets the tag bit if the value residing 
at agiven RAM address (addr) matches the 
value sent from the host. This address can be 
any number between 0 and 127. 

The algorithm starts off by first loading the 
NS register with the value stored at addr. At the 
same time, the EW register is loaded with the 
value sent from the host. Next, the EW and NS 
registers are exclusive-NORed, and the result is 
placed in the NS register, where it is ANDed 
with the tag bit. The result of the last operation 
is placed in both the tag location and inthe NS 
register. Although this describes a 1-bit search, 
it could easily be invoked repeatedly to operate 
with larger words. 


Five or six cycles 


Only five cycles are needed to compare a 0, six 
to compare a Ll. In all of the examples, the as- 
sumption is made that the controller can keep 
the associative array operating at maximum 
speed at all times, so that no wait stages are re- 
quired. 

Primitives such as compare increase the pow- 
er available for larger programming tasks and 
also simplify the programmer’s chores. The 


oS a eas a 








1 
haa 


| | 
l | 
[i] 
[J 


— 


third read does just that, employing the re- 
sponder signal supplied by the associative 
array to read the value of the responding cells. 
Doing so places the data in the responding cells 
at addr in the NS register. The routine: 


/*Load ns with TAG*/ 
ns: =ram(tag); 

/*Load ew with data’/ 
ew: =ram(addr), c: =0; 
/* AND tag and data’/ 
c:=cy; 

/*Place results in NS */ 
ns:=C; 


makes good use of the fact that the tag bit is 
stored in memory, so it can be used con- 
secutively a number of times. The read is called 
up by logically AN Ding the tag at ram (tag) with 
the data at addr and then placing the results in 
the NS register. 

A read of this type would be desirable when- 
ever it is necessary to address portions of the 
array individually at a unique address—as with 
a conventional memory — instead of by the con- 
tent of memory. This is done by loading thedata 
words over the CM bus so that each elements 
holds a unique number in its RAM. An exact 
match search (using either primitives or the 
larger program) could then be executed and the 
cell contents read. On a grid of systolic arrays 
516 elements on a side, the exact match would 
take a maximum of 114 cycles and the read 
would take 4 X M evcles. 


Easing into associative processing 


Once an associative memory has been built 
from GAPP chips, it is relatively simple to en- 
hance it to create an associative processor. The 
primitives and techniques detailed permit a full 
range of searches, which can be combined with 
arithmetic or logical operations to further in- 
crease the system’s capabilities. 

As simple as it sounds, one of the functions 
most useful to associative memories Is count- 
ing, or simply summing the number of re- 
sponders. Its importance can be seen in a limit 
search, in which it might be vital to know how 
many bits fall within a set of limits. Doing so 
relies on the chip’s ability to perform serial 
arithmetic, keeping track of the number of cells 
that have the tag bits set. 


A related function, called first, lets users pro- 
cess the bits on an individual basis. Once it is 
determined that there are multiple responders, 
it becomes important to turn off all but the de- 
sired one, so that the others are not altered 
when the operation takes place. 

One technique is to propagate a marker in the 
northwest corner cell of the array and carry it 
eastward along the top row until the eastern 
edge of the array is reached. It can then drop 
down a row and move in the opposite direction, 
continuing with its serpentine route across the 
full array. Since the marker resides in only one 
cell at a time, only that cell is on, so it is the only 
one being addressed at any given time. Sampl- 
ing the responder after each shift of the marker 
makes it possible to know when the first re- 
sponder has been reached. Another technique 
would be to prioritize multiple responders with 
the unique PE address. 

A final algorithm takes advantage of some of 
the processor element’s logical functions to cal- 
culate the number of bit positions by which the 
corresponding digits in each cell differ from 
those of the desired value. This number, or 
Hamming distance, forms the basis of numer- 
ous error correction codes. The algorithm: 


/*Store Zero in COUNT field of al! Candidates*/ 
for (i=0,i<cM;i+ +) 
WRITE(COUNT + i, 0); 
/*Compare ail bits * / 
for(i=0;i<M;i+ +) { 
SET; 
COMPARE(CELL + i,COMPARAND(i)): 
ADDONE; 


is effectively a search for no match, which is fol- 
lowed by incrementing a count field in all re- 
sponders. Once the Hamming distance is deter- 
mined for each word in the associative array, a 
least-value search can be performed on the 
count field to find the word with the smallest 
distance. Every time no match is found, the 
counter is incremented by one.O 


DESIGN ENTRY 


Systolic array chip 





identified, it is often important to interrogate 
the array to read its values. The most obvious 
way to accomplish this is to shift the contents of 
each cell out via the CM bus. The data words 
would have to be prefaced by the tag bit, so that 
the external circuitry can determine which of 
the words are responders. Obviously, this 
approach operates on both responders and non- 
responders, but this will not matter in many ap- 
plications. 

Nevertheless, there are two drawbacks to 
this method. First, if a large array is chosen to 
increase throughput, the time needed to shift 
all the data out can be significant (see the table 
below). Second, the requisite external circuitry 


Program 4. in search of limits 


/*Mark all elements as undecided*/ 
WRITE(X, 1); 
WRITE(Y, 0): 
WRITE(Z, 0); 





























/*Loop through all bits*/ 
for(i=0:i< M,i+ +)| 

/*Reset Tag*/ 

ns=0; 

if(comparand(i) = = 0)} 
/*locate all undecided cells*/ 






COMPARE(X, 1); 
/*locate greater than cells*/ 
COMPARE(cell +i. 1); 
/*mark responders as greater than”/ 
WRITE(X, 0); 
WRITE(Y, 1): 
| 
else} 
/*locate all undecided cells* / 
COMPARE(X, 1); 
/*locate tess than cells*/ 
COMPARE(cell +i, 0); 
/*mark responders as less than’/ 
WRITE(X, 0); 
WRITE(Z, 0); 


Loading or reading time for a single-bit piane 


Dimensions Total number : 
of array of processor | Total number 
(processor elements} elements of GAPP chip 












48 X 48 2304 
432 X 132 17,424 | 
516 xX 516 266,256 

1032 x 1032 


1,065,024 


inherent to this scheme is substantial. How- 
ever, in tasks like image processing, where large 
portions of the data are of interest, this still 
might be an effective technique. 


A question of comparison 


A second approach simply employs the com- 
pare instruction to read the value of a bit using 
the responder signal. Comparing addr with a 0 
effectively shifts the data in along with the re- 
sponder signal, where it can then be shifted into 
the target-word register. A compare between 
the addr and 1 shifts the inverted data in with 
the responder signal. If there is more than one 
responder, the OR of all the data bits is posi- 
tioned on the responder signal. 

This method can be called into play when it is 
necessary, for instance, to establish the num- 
bers of pixels that are at their maximum values. 
The algorithm for determining which of the 
words in memory has the highest value uses the 
responders first as decision-making data and 
then to read the maximum value. The first two 
steps of the following algorithm effectively 
amounts to an instruction that sets all the tag 
bits: 

for (i=0;i<M, i+ +)} 

/*search for undecided cells” / 

SET; 

COMPARE(R, 1); 

COMPARE((CELL +i), 1); 

if (RESPONDER) 
WRITEN(R,(); 


The program first scans all the memory cells, 
looking at the most significant bits. If any cel] 
has a 1 in that position, it is noted that it holds 
the highest value. All the cells with 0s are 
marked as storing less than highest. When the 
bits are all processed, the cells containing the 
highest value at each bit are determined to 
contain the maximum word value. This will 
take 14 cycles for each bit of the data. 


Cutting down on overhead 


The program uses the compare algorithm 
whether or not the data is to be read out. It is 
possible, though, to read the contents of a cell 
without incurring the overhead associated with 
the exclusive-OR used in a comparison. The 


sa 


tially processing each bit, that is, it is a word- 
parallel, bit-seria! processor. 

A basic relational] data-base scheme can be 
implemented with the systolic array, but since 
its architecture is not similar to that of von 
Neumann processors, the architecture of the 
relational data-base system will also differ. 
The systolic array processor can be used ina 
subsystem or as part of another system. 


An accent on speed 


The chip’s single-instruction, multiple data- 
path (SIMD) architecture increases through- 
put in common tasks within the relational data 
base. The chip, arranged in rows and columns of 
processing elements, handles rows and columns 
of data. 


For instance, in a relational Join operation - 


(Fig. 1), two tables (a) are linked to create a 
third table (b). The array forms each row in the 
result by joining two rows, one from each of the 
tables. The selected rows each have a common 


John 
Peter 
Michelle 


John 
Peter 
Michelle 
Michelle 





element. A Semi-Join operation produces an 
output from only one of the tables, but the 
items selected for output depend upon the sec- 
ond table (c). 

In a typical processing system, each of these 
steps must be handled sequentially. However, 
the systolic array can work with all the rows 
and tables at the same time, producing the Join 
tables much faster. Each processor element in 
the array works on an individual] datum (item 
of data). The array can be used with various 
data base formats: a number of chips can be tied 
together to make a large grid of processor ele- 
ments, thereby increasing throughput. 

Since a parallel processor handles data faster 
than its traditional counterpart, its I/O rates 
must correspondingly be faster. Mainframe 
parallel processors have used three major ap- 
proaches, and all can be used with systems built 
with the systolic array. The earliest technique 
dedicated a head for each track of the fixed disk 
that the data-base management system used 


Math , Management 301 Michelle 
English Management 301 Pater 
French Math 201 John 

Math 201 Michelle 


Math Math 201 
English Management 301 
Franch Management 301 


John Math 
Michelle French 


French Math 201 (c} 


1. The systolic array chip can easily handle such relational data-base 
operations as Join and Semi-Join. A Join operation, for instance, links 
files 1 and 2 (a) to form file 3 (b). The Semi-Join operation then sear- 
ches the newly formed table, pulling out which students are enrolled in 
a mathematics class (c). 





ee a a Ea IT a IE aa 


DESIGN ENTRY 





Systolic arrays fill the bill 
- as data-base management 
heads for gigabyte range 





Parallel-processing building blocks with distributed 
memories offer speed and ease of use in systems 
where von Neumann architectures would falter. 





This is the last in a five-part series on the first com- 
mercial systolic array processor chip. The initial artt- 
cle was the cover story of the Oct. $1 issue. and with 
the exception of Dec. 27, an installment has appeared 
in every succeeding issue. 


ata-base management has become in- 
Pl) essines important in recent years, 

especially for relational data bases. 
However, as the size of these data bases moves 
toward the gigabyte range, conventional von 
Neumann architectures are too slow to meet de- 
mands. In large part, this is because the basics 


Alexis Koster and Norman Sondak 
San Diego State University 
Paul Sullivan, NCR Microelectronics 


An associate professor in the Information Systems De- 
partment of San Diego State University, Aleris Koster 
concentrates his research on such parallel program- 
ming languages as Prolog. Previously, he developed 
language processors for NCR. Koster has a PhD in 
computer science from the University of North Caroli- 
na at Chapel Hill. 


Norman Sondak is chairman of San Diego State's In- 

formation Systems Department, where he is involved 

with designing computer and information systems. He 
has led the computer science department at Worcester 
Polytechnic Institute and earned a PhD from Yale. 


Paul Sullivan is the business unit director for digital 
signal-processing devices at NCR's Microelectronics 
Division in Fort Collins, Colo. Before joining the com- 
pany, he worked with Hughes Research Laboratories. 
He holds a PhD and an MSEE from the University of 
Southern California. 


Reprinted from ELECTRONIC DESIGN - January 10, 1985 


of data-base management — storing, retrieving, 
searching, updating, deleting, merging and or- 
dering data—are not numerical operations. 
System designers must spend considerable 
time translating the basic data-base commands 
into host instruction sets for use on conven- 
tional processors. 

Compounding this problem is a conflict be- 
tween the requirements of operating systems 
and data-base managers. Ideally, a data-base 
system should store indexes in locations that 
contain only the information the indexes refer- 
ence. However, operating systems distribute 
data to make the best use of available storage. 
Moreover, there is a tendency in von Neumann 
virtual memory systems to swap out pages of 
data frequently used by the data base. The big- 
gest bottleneck of the von Neumann architec- 
ture is that all data processing is sequential. 
The net result of all these factors is a great in- 
crease in the data that a system must get from 
memory —sometimes 10 times more than is ac- 
tually needed. 

A practical solution is to develop parallel pro- 
cessing data-base management systems, and 
designers can do this with the Geometric Arith- 
metic Parallel Processor (GAPP), a systolic 
array containing 72 single-bit processors. Each 
processor has 128 bits of RAM. The chip, the 
first commercially available two-dimensional] 
systolic array, processes data words in parallel, 
working on words of varying lengths by sequen- 


— 







one of the RAM addresses. When the desired 
word is found, its replacement can easily be 
substituted while data streams through the 
pai at rates up to a million characters per sec- 
ond. 


Shuffling the cards 


Another major task of any data-base man- 
agement system is sorting. This involves a bit- 
comparison operation that is somewhat similar 
to the compare operation. But in comparing bits 
for sorting, the system must determine not only 
whether the data in the NS register matches 
that in the EW register, but also whether it is 
greater or less than that contained in the EW 
register. 

The system does this three-way comparison 
by examining both the Sum (SM) and Borrow 
(BW) outputs from the ALU (see the table, 
below). If condition C = Qand thedataintheNS 
register matches the data in the EW register, 
then SM = 0. If the data in the NS register does 
not match, then SM = 1. However, when the 
data is not matched, BW = 1 if the data in the 
NS register is less than the data in the EW reg- 
ister. Alternatively, BW = 0 if the data in the 
NS register is greater than data in the EW reg- 
ister. The “match-finding” program, performs 
this operation and stores the SM and BW res- 
ults in RAM 2 and RAM 8, respectively. 

This greater-than or less-than comparison 
forms the basis of the second building block, the 
sorter. It performs a task common to all data- 


A three-way sorting comparison 
on ee 











ssccee+-/B 


a~aoo8 200-4 


oe oer wt OOdcd 
ssesc004|2 





base systems: reordering data, based on the 
user’s request. With minor alterations, the tra- 
ditional ordering and sorting algorithms used 
with conventional serial processors can be eas- 
ily switched to take advantage of the parallel 
processing of the systolic array. The result is an 
increase in performance. 

The sorting algorithm used ina systolic array 
system is basically just a parallel version of the 
classical exchange sort, in which pairs of num- 
bers or other data are exchanged to reflect their 
relative position in the sorting order. For in- 
stance, pairs of data are first compared in the 
order in which they are found in memory (1 and 
2,3 and 4). Then they are paired again for the 
next comparison (2 and 3, 4 and 5). At each com- 
parison, the pairs of items not in order are ex- 
changed (Fig. 3). 


Getting a perfect match 


To implement such a sorting algorithm with 
the systolic array chip, two strings comprising 
records A and B must be loaded into the array, 
much as was done in the comparator block. 
Each record is again assumed to consist of 
12 characters of 6 bits each. Once they are load- 
ed in, the SM and BW output bits are computed 
to find matches (see the program, p. 354). 

If the records do not match, the system must 
determine whether record A is less than or 
greater than record B. The systolic array does 
this by searching the bit string until the first 
unmatched character is found, then identifying 
the MSB that does not match, again using bit 
comparison. 

The array does this by shifting a marker, or 
“1” bit, in serpentine fashion through all the 
processor elements as shown in Figure 2. The 
maker propagates until it finds the “FIRST” re- 
sponder, where RAM2=1. Then RAM3 is exam- 
ined in that processor element to determine 
whether the records should be swapped. 

During this operation the device must make 
an exchange or no-exchange decision before the 
records are loaded into the next row of sorter 
blocks. This may require the marker to propa- 
gate through all 72 processor elements within 
some chips. Meanwhile, other chips in the grid 
are idle until all devices have completed their 
swaps. 

Each sorting cycle for comparing pairs takes 


DESIGN ENTRY 
Systolic array chip 


for storage. The technique also proposed a pro- 
cessor for each multiple head to furnish the 
utmost in throughput. 

But the high cost of multiple head disks, cou- 
pled with the expense of individual processors 
for each track, forced compromises. Now, high 
speed, moving-head technology has been devel- 
oped so that there is only one head for each disk 
surface, greatly trimming the requisite number 
of heads and processors. 

As memory prices have dropped, data-base 
architectures have moved toward cache mem- 
ories. Large disk caches increase speed, min- 
imizing the number of disk accesses while deliv- 
ering faster transfer rates than are possible 
with off-the-disk approaches. 

Regardless of the approach taken, the systol- 
ic array will generally be used only to perform 
dedicated data-base processing: its architec- 
ture does not lend itself to the diversity of tasks 
that must be performed by a host computer. 


Block by block 


As with other set-ups discussed in this series, 
the systolic array data-base management 
system can be built building-block style. 
Groups of GAPP devices can be put together, to 
form an SIMD array of the required size. In ad- 
dition, several different blocks can each per- 
form specific tasks so that the complete data 
base machine operates as a multiple-instruc- 
tion, multiple-data-path (MIMD) system. 

One block, for example, can address the basic 
task of any data-base manager: searching the 
memory to locate some required information or 
to determine that it is not in the system. The 
systolic array easily makes such comparisons 
in parallel. Consider a search for a 12-character 
comparand (the comparand is the required 
data that is compared with the data in memo- 
ry). If each character is made up of 6 bits, the 
12-character search can be handled neatly by 
the 72 processor elements of one chip (Fig. 2). 

The code for loading the comparand into the 
chip is simple: CM:=CMS (repeated 12 times) 
followed by the instruction EW: = RAMO, 
RAMO=CM. The EW register, one of four reg- 
isters in each processor element, is loaded via 
the CMS line. Since several systolic arrays can 
be linked to form grids, it is easy to increase the 
size of a grid to match the typical word size and 


processing rates of the system. 

As the data is being fed into the grid, it 
streams through the chips, entering on the 
south and exiting at the north. After each char- 
acter is clocked into the array, an exclusive OR 
comparison is performed and the result placed 
in the NS register. If all characters in the array 
match, then the global output (GO) flag is high. 


I don’t care 


The comparand can be masked so unimpor- 
tant characters or bits represent a “don’t-care” 
condition. Locations being masked with the 
“don’t-care” condition will place a Qin RAM 2; 
while all other locations will have a lin RAM 2. 
The result of the exclusive OR comparison is 
then ANDed with the mask in RAM2 before the 
result is placed in the NS register. 

This type of exact match is useful for search- 
ing text files for specific information. In a com- 
mon variation of this operation, like “find and 
replace,” the replacement word can be stored in 


GAPP chips 
(6 X 12) 


Input 
comparand 
or record 
B as 
€-bit-wide 
words on 
CMS lines 





2. The chip can accept and process input record A, 
which comes from memory, and the comparand or 
input record B. If it cannot complete the processing 
in one pass, the signal can be wrapped around and 
run through again. 


oh 








a ss 
a es 
az jee) 


— 
4 
=] 


i 





matches for both the time and salary constants. 
The results are then passed to a third block, 
which performs the AND function that deter- 
mines acceptance or rejection for the two-part 
query. 

Table settings 


A related function that can be performed 
much more quickly with parallel processors is 
a multiple file operation like Join orSemi-Join. 
A Semi-Join operation, the most common in 
most data-base processing systems, produces a 
subset of one table; this subset is determined by 
a relationship from a second table. 

This operation requires both sorter and com- 


pare blocks (Fig. 4). The search for “employees 
working on the data-base project” is a typical 
Semi-Join operation; it uses two files. The two 
in this example have a matching component, 
the Social Security number, that lets the sys- 
tem list the personnel on the project, even 
though their names are not in the project file. 
The task can be performed by several blocks of 
GAPP devices. The two files are passed through 
two sorter blocks. There the lists are put in or- 
der employing the Social Security numbers. 

A third block, a comparator, compares the 
data in the project record in memory to the con- 
stant—in this case, the “data-base” manage- 
ment project—selecting only the records that 


A match-finding program 


CM:=0, NS: =0, EW:=0, C:=0, RAMO:=C 
CM:=CMS, NS:=S, EW:=0, C:=0, RAMO:=C 


/* Initialize */ 
/* Repeat this instruction until all 


characters of the two records to be compared are loaded into the array °/ 


CM:=CM, NS:=NS, EW:=0, C:=0, RAMO:*SM 


/* Write record A to RAM 0 */ 


CM:=CM, NS:=NS, EW:=RAM1, C:=0, RAM1:=CM /* Write record B to RAM 1 and EW */ 
CM:=CM, NS:=RAM2, EW:=0, C:= BW, RAM2:=SM = /* Write SM = A @ 8B to RAM 2 and 


CM: =CM, NS:=NS, EW: =0, C:=0, RAM3:=C 


NS‘/ = 
/* Write BW = A « B to RAM 3 °/ 
/* Test GO: ff GO = 1 then 


record A matches record B */ 


Generate name 


Name trom file 1 
is put out when 
SSN from that 
fie 8 matched 
by GAPP 4 


if a match 


Sorted Ust of 
SSN (output only 
project = data-base 

menagemen 





4. A configuration of parallel processor building blocks can tackle data-base 
tasks. Here four chips search through two files to find the names of workers 
involved in a date-base management project. 


DESIGN ENTRY 


Systolic array chip 


less than 24 us. Thus eight records can be sorted 
through the eight stages in 192 us, and eight 
more records can be entered into the pipeline 
every 24 us. 

A sort using the systolic array chips will take 
N Steps to process N elements, compared with 
N* steps for a typical von Neumann processor. 


Forging links 


Systolic array building blocks can easily be 
linked into systems that perform the data-base 
functions. The comparator block alone can per- 
form queries involving only one file, such as 


Smatiest 
valve vaiue 
record : record 


fh g f e qd c b & 


a CMN N biel N CMN 
S CMS 
a 
N CMN 
S CMS 


fos CMS biel CMS 
b 
N CMN N CMN 
$ CMS S cms S CMS 

N CMN 

S CMS 

t 

N CMN N CMN 
S CMS 8 CMS 
N CMN 
S CMS 


N CMN 
S CMS 


N CMN 
S CMS 


N CMN N CMN 
S CMs S CMS 


oe i. 


“Find all emplovees hired before 1980.” 

The data-base records are read in parallel 
and loaded into the grid of systolic array chips. 
Both the query field, “date of employment,” and 
the constant, “1980” are loaded into the com- 
parator. As the compare function is performed, 
the records that are less than “1980” are sent to 
the host, while the others are ignored. 

The use of two comparator blocks allows de- 
signers to perform more complex tasks, like 
finding “employees hired after 1980 whose 
salary is greater than $25,000.” Two blocks of 
systolic arrays can search in parallel to find 


After final sort 


After first sort 


These devices serve onty as 
torage/delay elements. 





3. The array chip compares the values “a, b,c...” 


along the bottom of 


each row of processor elements and then rearranges them in the de- 


sired rank until the proper values reach the top. 


tones Remsen Sesveoent, | i nmi 


its a hs 


| 


-4 : A 


mae ae ae ] 
— a 


ae 
_ 
eal 


Microelectronics 


ALABAMA 

Rep, Inc. 

P.O. Box 4889 
Huntsville. AL 35815 
(205) 881-9270 


ARIZONA 

BH&B Sales, Inc. 
7353 6th Avenue 
Scottsdale, AZ 85251 
(602) 994-4455 


BH&B Sales, Inc. 
1041 W. Comobabi 
Tucson, AZ 85704 
(602) 299-1508 


CALIFORNIA 

Custom Technology Sales 

992 S. Saratoga-Sunnyvale Rd. 
San Jose, CA 95129 

(408) 252-9901 


Earle Associates Incorporated 
7585 Ronson Road 

Suite 200 

San Diego, CA 92111 

(619) 278-5441 


Orion Sales. Inc. 

828 E. Colorado Boulevard 
Suite F 

Giendale, CA 91205 

{213} 240-3151 


Orion Sales, Inc. 

285 East Main Street 
Tustin, CA 92680 

{7 14) 832-9687 


COLORADO 
Electrodyne 

Suite 110 

2620 South Parker Road 
Aurora, CO 80014 

(303) 695-8903 


CONNECTICUT 

Data Mark, inc. 

47 Clapboard Hill Road 
Guilford, CT 06437 
(203) 453-0575 


FLORIDA 


Universal Marketing & Sales. inc. 


413 Martin Road 


. North Palm Beach, FL 33408 


(305) 842-1440 


GEORGIA 

Rep, Inc. 

1944 Cooledge Road 
Tucker, GA 30084 
(404) 938-4358 


ILLINOIS 

Sieger Associates 

1350 Remington Road Suite UV 
Schaumburg, IL 60195 

(312) 310-8844 


NCR Microelectronics Division 


INDIANA 


Technology Marketing Corp. 


$99 Industriat Drive 
Carmel, IN 46032 
(317) 844-8462 


Technology Marketing Corp. 


3428 West Taylor Street 
Fort Wayne. IN 46804 
(219) 432-5553 


KANSAS 

KESCO, Inc. 

10111 Santa Fe Drive 
Overland Park, KS 66212 
(913} 541-8431 


KEBCO, Inc. 

16047 East Kellogg 
Wichita. KS 67236 
(3 16} 733-1301 


KENTUCKY 


Technology Marketing Corp. 


8819 Roman Court 
P.O. Box 91147 
Louisvitle, KY 40291 
(502) 499-7808 


MARYLAND 
Marktron, tnc. 

1688 East Gude Drive 
Rockville, MD 20850 
(301) 251-8990 


MICHIGAN 

Westbay & Associates 
27476 Five Mile Raod 
Livonia, MI 48154 
{313} 421-7460 


MINNESOTA 

Aldridge Associates, inc. 
7138 Shady Oak Road 
Eden Prairie, MN 55344 
(612) 944-8433 


MiSSOUR! 

KEBCO, INC. 

75 Worthington Drive 
St. Louis, MO 63043 
(314) 576-4111 


NORTH CAROLINA 
Rep, Inc. 

7330 Chapet Hill Road 
Suite 204 

Raleigh, NC 27607 
(919) 851-3007 


Rep, Inc. 

6407 Idiewild 

Suite 226 

Carlotte, NC 28212 
(704) 563-5554 


NEW MEXICO 

Neico Electronix 

4801 Generai Bradley, N.E. 
Albuquerque, NM 87111 
(505) 293-1399 


OHIO 

Bear Marketing, Inc. 
3623 Brecksville Road 
P.O. Box 177 
Richfield, OH 44286 
(216) 659-3131 


Sales Representatives 


OREGON 

Electronic Component Sales 
9755 Southwest Penbrook 
Tigard, OR 97223 

(503) 245-2342 


PENNSYLVANIA 
TCA Associates 

801 Media Line Road 
Broomall, PA 19008 
(215) 353-2022 


UTAH 

Electrodyne 

Suite 109 

2480 South Main Street 
Salt Lake City, UT 84115 
(801) 486-3801 


TENNESSEE 

Rep, Inc. 

P.O. Box 728 

Jefferson City. TN 37760 
(6 15) 475-4405 


TEXAS 

Oeler & Manelaides. Inc. 
8340 Meadows Road 
Suite 224 

Dallas, TX 75231 

(214) 361-8876 


Oeler & Menelaides, Inc. 
9119 South Gessner 
Suite 201 

Houston, TX 77074 
(713) 772-0730 


Oeler & Menelaides, Inc. 
7113 Burnet Road 

Suite 207 

Austin, TX 78757 

(512) 453-0275 


WASHINGTON 

Electronic Component Sales 
9311 Southeast 36th Street 
Mercer Island, WA 98046 
(206) 232-9301 


CANADA 

Cantec Representatives, Inc. 
1573 Lapperriere Avenue 
Ottawa, Ontario, CANADA 
K1Z 7T3 

(613) 725-3704 


Cantec Representatives, inc. 

8 Strathearn Avenue - Unit #18 
Brampton, Ontario, CANADA 
L6T 4L8 

(416) 791-5922 


Cantec Representatives, inc. 
3639 Sources Road, Suite 116 
Dotlard des Ormeaux 

Quebec, CANADA H9B 2K4 
(514) 683-6131 


EUROPE 

Manhattan Skytine, Ltd. 
Manhattan House, Bridge Road 
Maidenhead, Berkshire 
ENGLAND SL6 8DB 
Maidenhead (0628) 39735 


DESIGN ENTRY 
Systolic array chip 


match. The results are then passed to another 
comparator block, with only the Social Security 
number going to this group of systolic arrays. 
This sorter block then looks for matches with 
the file that has both names and Social Security 
numbers. When a Social Security number from 
the employee file is not in the list on the com- 
parator block, that record is removed from the 
buffer by the selector. When the two records 
match, the employee’s name is passed through 
the selector to the host computer. This type of 
parallel processing architecture provides 
several orders of magnitude higher throughput 
than a von Neumann machine. 

By using building blocks in this fashion, a de- 
signer can easily create a full system divided in- 
to five units: the host, the systolic array con- 
troller, the systolic array blocks, a switching 
matrix, and a storage device (Fig. 5). 

The host performs typical tasks, including 
processing and compiling queries, issuing com- 
mands to the systolic array, and receiving 





responses, as well as handling user communica- 
tions and interfacing. The GAPP controller 
need be no more than a dedicated microcomput- 
er. It receives commands from the host and an- 
alyzes them, then dispatches programs and I/O 
commands for the array. It also receives the 
output from the arrays when the tasks are com- 
pleted. The controller then selects data from 
this response and sends the appropriate data to 
the host. Each block of chips in the system has 
an address, so the controller can distribute pro- 
grams over the appropriate control lines. 

The switching matrix serves as the link be- 
tween the storage device and the systolic array 
building blocks. Storage can be handled by 
disks or cache memory.o 


5. Systolic array chips can be called into duty in a data-base man- 
agement system, forming a processor block, a controller, and 8 
switching matrix. The matrix prepares data for array processing. 


gv an 
Copyright © 1984 by Hayden Publishing Co., Inc., Hasbrouck Heights, NJ, USA. All rights reserved. Printed in USA. 


al 


eal 


gam 


fn 


—e ee aay Eka Ea —— Sa et ii es ad [Em Cad ‘eet 
© ==). <a Yo fa Sa wee OR 
teal re Lae] aaereS Leite! roe > gem ad 





oo 
= 


7 


Ss 
3 


NCR, 


Microelectronics 


PIONEER-STANDARD ELECTRONICS, INC. 


ALABAMA 

3207 Putman Drive N.W. 
Huntsville, AL 35805 
{205) 837-9300 


CONNECTICUT 
112 Main Street 
Norwalk, CT 06851 
(203) 853-1515 


FLORIDA 

221 N. Lake Blvd. 

Altamonte Springs, FL 32701 
(305) 834-9090 


1500 Northwest 62nd Street 
Ft. Lauderdale. FL 33309 
(305) 771-7520 


GEORGIA 

5835 B Peachtree Corners East 
Norcross, GA 30092 

{404) 448-1711 


ILLINOIS 

1551 Carmen Drive 

Elk Grove Vitlage. JIL 60007 
(312) 437-9680 


INDIANA 

6408 Castleplace Drive 
Indianapolis. IN 48250 
(317) 849-7300 


MASSACHUSETTS 
44 Hartwell Avenue 
Lexington, MA 02173 
(617) 861-9200 


MARYLAND 

9100 Gaither Road 
Gaithersburg, MD 20760 
(301) 921-0660 


MICHIGAN 

13485 Stamford 
Livonia, MI 48150 
(313} 525-1800 


MINNESOTA 

10203 Bren Road East 
Minnetonka. MN 55343 
(612) 935-5444 


NEW JERSEY 

45 Rt. 46 

Pine Brook, NJ 07058 
(201) 227-1262 


NEW YORK 

1806 Vestal Parkway 
Vestal, NY 13902 
(607) 748-8211 


840 Fairport Park 
Fairport, NY 14450 
(716) 381-7076 


40 Oser Avenue 
Hauppauge. NY 11787 
(516) 231-9200 


60 Crossways Park West 
Woodbury, NY 11797 
(6 16} 921-8700 


NORTH CAROLINA 


9801 A Southern Pine Bivd. 


Charlotte, NC 28210 
(704) 527-8188 


OHIO 

4800 East 13 1st Street 
Cleveiand, OH 44105 
(216) 587-3600 


4433 Interpoint Blvd. 
Dayton, OH 45424 
(513) 236-9900 


PENNSYVANIA 
261 Gibraltar Road 
Horsham, PA 19044 
{215) 674-4000 


259 Kappa Drive 
Pittsburgh, PA 15238 
(412) 782-2300 


TEXAS 

9901 Burnet Road 
Austin, TX 78758 
{512) 835-4000 


13710 Omega Road 
Datias, TX 75240 
(214) 386-7300 


5853 Point West Drive 
Houston, TX 77036 
(713) 988-5555 


CR Microelectronics 





Distributors 


WYLE LABORATORIES 


ARIZONA 

8155 North 24th Street 
Phoenix, AZ 85021 
(602) 249-2232 


CALIFORNIA 

124 Maryland St. 

Et Segundo, CA 90245 
(213) 322-8100 


17872 Cowan Ave. 
irvine, CA 92714 
{7 14) 863-9953 


9525 Chesapeake Dr. 
San Diego. CA 92123 
(649) 565-9171 


3000 Bowers Ave. 
Santa Clara, CA 95051 
(408) 727-2500 


41151 Sun Center Dr 
Rancho Cordova, CA 95670 
(9 16) 638-5282 


COLORADO 

451 East 124th Ave. 
Thornton, CO 80241 
(303) 457-9953 


OREGON 

5289 N.E. Elam Young Pkwy.. Bidg. 100 
Hillsboro, OR 97123 

(503) 640-6000 


TEXAS 

1810 Greenville Or. 
Richardson, TX 75081 
{214} 235-9953 


11001 South Wilcrest 
Suite 100 

Houston, TX 77099 
(713) 879-9953 


2120 Braker Lane, Suite F 
Austin, TX 78758 
(512) 834-9957 


UTAH 

1959 S. 4130 West 

Salt Lake City, UT 84104 
(801) 974-9953 


WASHINGTON 

1750 132nd Ave.. N.E. 
Bellevue. WA 98005 
{206} 453-8300 


MANHATTAN SKYLINE 


UNITED KINGDOM 
Manhattan House 

Bridge Road 

Maidenhead 

Berkshire SL6 8DB 
England 

Maidenhead (0628) 75851 


NCR Microelectronics 


EASTERN AREA SALES OFFICE 


NCR Microelectronics Division 
400 W. Cummings Park 

Suite 2750 

Woburn, MA 01801 

Phone: (617) 933-0778 


CENTRAL AREA SALES OFFICE 


NCR Microelectronics Division 
400 Chishoim Place 

Suite 100 

Piano, TX 75075 

Phone: (214) 578-9113 


WESTERN AREA SALES OFFICE 


NCR Microelectronics Division 
4655 Old jronsides Drive 
Suite 400 

Santa Clara, CA 95050 
Phone: (408) 727-6575 


NICIR 


Microelectronics 





(ogee 
one 


ICS 


electron: 


ich 


‘M 


Fei 


von rN RRE W 
- 


3 » - 


+ 


L 


Jal 
MAN yt tell 


“py rthe ved 











Ariad Vie 





NCR CORPORATION 


Microelectronics 7 
Division 


—| 


NCR Microelectronics Division 2001 Danfield Ct. Fort Collins, Cotorado 80525 
Telex: 045-4505 NCRMICRO FTCN Phone: 303/226-9500 303/223-5100 





[ 


NCR’s Commitment to Quality 


As a pioneer in microelectronic tech- 
nology, NCR has been manufacturing 
components for its own product line 
since 1971. This experience has pro- 
vided opportunities to learn about 
user application problems, the impor- 
tance of component quality and re- 
liability, and their effects on total 
system reliability. The net result of 
such experience is a dedication to 
manufacturing superior components 
based on a firm commitment to quality 
and reliability. 


| NCR Quality Assurance completes a 
rigorous evaluation of each product to 
ensure conformance of the product to 

| its specification. Once a component is 
approved for production, stringent 
process and assembly controls aiong 
with detailed inspections are used to 
build in reliability. Comprehensive 
electrical testing is performed to 
guarantee the performance of each 

| component: finished products are in- 
spected before shipment to assure the 
conformance to specification of each 
jot of devices, and sampling plans are 

[ constantly revised and updated to im- 
prove quality. 

Essential to any reliability program is 

[ feedback from the system user— 

communication that is vital for reliabil- 
ity growth. NCR strives to ‘close the 
loop’ by communicating with users to 
| evaluate problems and respond with 
corrective action. The closed-loop 
concept results in better understand- 
ing of user needs while improving re- 
| liability. 
The NCR commitment to quality and 
reliability is an integral part of cor- 

[ porate philosophy originating from 
and emphasized by the highest levels 
of NCR management. This manage- 
ment direction, combined with NCR's 

[ manufacturing and user application 
experience, provides a solid frame- 
work for continued improvement in 

| quality and reliability. 





ae ene gh a ena eee ia a a | ee Sena A es ae 


aye 


. 


NCR Microelectronics 


NCR, a multi-billion-dollar manufac- 
turer of computer systems, terminal 
products, and semiconductors, es- 
tablished its first microelectronics labo- 
ratory in 1963 to stay abreast of the 
emerging semiconductor technology. 
The laboratory was expanded in 1966 
to provide limited quantities of proto- 
type microcircuits designed for use in 
a number of new products. By 1968 
the first MOS circuits were produced, 
and by 1970 a complete family of cir- 
cuits had been designed, produced in 
prototype quantities, and incorporated 
into new NCR products. Based upon 
knowledge gained in this research 
and confidence in the ultimate advan- 
tages of MOS, the decision was made 
to expand the internal production ca- 
pability. In 1971, the Miamisburg, 
Ohio plant was completed. 


To meet internal dernand, NCR ex- 
panded its microelectronics operation 
in 1975 with the addition of a second 
production facility in Colorado 
Springs, Colorado, and in 1979 
added a third facility in Ft. Collins, Col- 
orado. The Colorado Springs facility 
was replaced in 1982 by a new plant 
occupying 100,000 square feet. This 
new plant is one of the most modern, 
best-equipped facilities of its kind any- 
where. 


NCR Microelectronics manufactures 
state-of-the-art NMOS, CMOS, and 
non-volatile SNOS components which 
provide a competitive advantage to its 
computer systems and terminal prod- 
uct lines. 


In mid-1981 NCR announced its entry 
into the merchant semiconductor mar- 
ket. The strength and discipline 
gained in 10 years of internal supply 
is now being made available to our 
customers. This experience, together 
with a family of innovative products 
and services, establishes NCR as a 
leading supplier of semiconductor 
devices and services. 


Copyright © 1985 by NCR Corporation, 
Dayton, Ohio, U.S.A. 
Alt Rights Reserved. Printed in U.S.A. 


Colorado Springs. Colorado 





Fort Collins, Colorado 





Miamisburg, Ohio 





| | 


| 


| { ( 


: | 


( 


{ { i { { 


er 
ae 
4 


re 


| 
meer 






NMOS Read Only Memory Family (Continued) 


256K ROM 


NCR 23256-15t 
NCR 23256-20 
NCR 23256-25 
NCR 23256-30 
NCR 23256-45 


NCR 23256S-15t 


NCR 232568-20 
NCR 232568-25 
NCR 23256S-30 
NCR 23256S-45 


NCR 23257-15T 
NCR 23257-20 
NCR 23257-25 
NCR 23257-30 
NCR 23257-45 


NCR 232578-15t 


NCR 232578-20 
NCR 232578-25 
NCR 232578-30 
NCR 23257S-45 


Access Time 
Max (ns) 


Supply current 
Max (mA) 


Operating 


Commercial Operating Temperature of 0°C to 70°C is standard for all NCR NMOS ROMs. 


Industrial Operating Temperature of -40°C to 85°C is also available. 


CMOS Read Only Memory Family 





128K ROM 


256K ROM 


512K ROM 


1024K ROM 


NCR 23064-15 
NCR 23C64-20 
NCR 23064-25 


NCR 23C65-15 
NCR 23C65-20 
NCR 23C65-25 


NCR 230 128-15 
NCR 230 128-20 
NCR 23C 128-25 


NCR 230256-15 
NCR 23C256-20 
NCR 23C256-25 


NCA 23C512-15f 


NCR 230512-20 
NCR 23€512-25 


NCR 23C1000-25t 


+Product available 3085 


Commercial! operating temperature of 0° 


Access Time 
Max (ns) 


Standby 


Supply current 
Max (mA) 


25 


C to 70°C is standard tor all NCR CMOS ROMs. 


eh ES: ee Sem ee 


ee Re Ete ae ere en 


Characteristics 


Static/27256 
Static/27256 
Static/27256 
Static/27256 
Static/27256 


Static/Standby 
Static/Standby 
Static/Standby 
Static/Standby 
Static/Standby 


Static/Alt. Pin Out 
Static/Ait. Pin Out 
Static/Alt. Pin Out 
Static/Alt. Pin Out 
Static/Alt. Pin Out 


Static/Standby 
Static/Standby 
Static/Standby 
Static/Standby 
Static/Standby 


Characteristics 


Static/2564 
Static/2564 
Static/2564 


Static/2764 
Static/2764 
Static/2764 


Static/27128 
Static/27128 
Static/27126 


Static/27256 
Static/27256 
Static/27256 


Static/27512 
Static/27512 
Static/27512 


Static 


a ee 


Read Only Memories 

NCR offers a full line of high perform- temperature ranges. The NCR NMOS of a major supplier of ROMs in today's 
ance Read Only Memories (ROM) with = and CMOS processes and experience market. Look to NCR for your ROM re- 
a variety of pinouts and access times. in the ROM market allow NCR to pro- quirements to insure that your products 
All NCR ROMs are 5 volt only in both vide fast turnaround of prototype and reach the market place in time for max- 
commercial and industrial operating production quantities plus provide the imum market penetration. 


customer service and support required 


Supply current 
Organization Max (mA) 
Operating | Standby | 


NMOS Read Only Memory Family 





































NCR 2316-20 200 75 24 Static/2716 

NCR 2316-25 250 75 24 Static/2716 

NCR 2316-30 300 75 24 Static/2716 

NCR 2316-45 480 75 24 Static/2716 
75 





Static/2532 





NCR 2332-20 











NCR 2332-25 75 Static/2532 
NCR 2332-30 75 Static/2532 
NCR 2332-45 75 Static/2532 












NCR 2333-20 Static/2732 
NCR 2333-25 Static/2732 
NCR 2333-30 Static/2732 


NCR 2333-45 Static/2732 













































NCR 2364-20 BKx8 Static/2564 
NCR 2364-25 8Kx8 Static/2564 
NCR 2364-30 BKx8 Static/2564 
NCR 2364-45 BKx8 Static/2564 









































NCR 2364S-20 8Kx8 Static/Standby 
NCR 23648-25 8Kx8 Static/Standby 
NCR 23645S-30 8Kx8 Static/Standby 
NCR 2364S-45 BKxB Static/Standby 
NCR 23644-45* Two 4Kx8 Banks Static/Bank Select 


























































NCR 2365-20 8Kx8 Static/2764 
NCR 2365-25 8Kx8 Static/2764 
NCR 2365-30 8Kx8 Static/2764 
NCR 2365-45 BKx8 Static/2764 
NCR 2365S-20 BKx8 Static/Standby 
NCR 2365S-25 8Kx8 Static/Standby 
NCR 2365S-30 BKx8 Static/Standby 
NCR 2365S-45 8Kx8 Static/Standby 


































16Kx8 Static/27128 
16Kx8 Static/27128 
16Kx8 Static/27128 
16Kx8 Static/27 128 
16Kx8 Static/27128 






















































16Kx8 Static/Standby 
16Kx8 Static/Standby 
16Kx8 Static/Standby 
16Kx8 Static/Standby 
16KxB Static/Standby 
Four 4Kx8 Banks Static/Bank Select 
Four 4Kx8 Banks Static/Bank Select 


128K ROM NCR 23128-15t 
NCR 23128-20 
NCR 23128-25 
NCR 23128-30 
NCR 23128-45 
NCR 23128S-15t 
NCR 23128S-20 
NCR 23128S-25 
NCR 23128S-30 
NCR 231288-45 
NCR 23128A-30° 
NCR 23128A-45" 


“Licensed under U.S. Patent Number 4368515 
+ Product available 3Q85 


pn Ey ll ‘ers nae ll = rag ne 
\ i 


amend 


memes ee en re re er re rie ey, AT Ge =, me pe em 


Electrically Erasable PROM 


The NCR family of EEPROMs includes are 5 volt only devices with all erase/ 


small organization serial devices for write voltages being generated on chip. 
applications requiring a limited This combination of high density and 5 
amount of storage capability. The volt only operation places NCR in the 
NCR family also includes high density leadership position in EEPROMs. NCR 
by eight devices for applications re- EEPROMs are offered in commercial, 
quiring maximum data storage. All industrial, and military temperature 


members of the NCR EEPROM family ranges. 


Electrically Erasable PROM 


Access Time | Power Supply Operating No. of 
Max (Volts) Range (°C) Pins 
































256 Bit NCR 52801 Oto +70 Serial 
EEPROM NCR 52801 | -40 to +85 Serial 
NCR 59306 Oto +70 Serial 
NCR 59306 |! -40 to +85 Serial 






NCA 52832 Oto +70 28/32* Parailel 
NCA 52832 | —40 to +85 28/32" Parallel 
NCR 52832 HR -§5 to +125 28/32* Parallel 














*28 Pin DIP or 32 Pin LCC 





Non-Volatile RAM 


Non-volatiie RAM (NVRAM) circuits and performs like a static RAM during 
combine high performance static normal operation. During a system 
RAM with electrically erasable PROM power failure the entire contents of the 
on a single integrated circuit. The pri- Static RAM can be stored in the 

mary advantage NVRAMs offer the EEPROM array and are available for 
system designer is its ease of interfac- recall when system power returns to 

ing with a microprocessor without af- normal levels. NCR NVRAMs are 
fecting system performance. This is offered in commercial, industrial, and 
possible because an NVRAM looks military temperature ranges. 
Non-Volatife RAM 





















Access Time Power Supply No. of Operating 
Organization Max (ns) (Volts) Pins Range (°C) 
256 Bit NVRAM NCR 52210 64x4 300 +5 18 Oto +70 
NCR §2210 | 64x4 300 +5 18 -40 to +85 
NCR 52210 HR 64x4 450 +5 18 -§5 to +125 
512 Bit NVRAM NCR 52211 428x4 300 +5 18 Oto +70 
128x4 300 +5 18 
450 +5 18 
1K NVRAM 18 
1K NVRAM 



























NCR 522111 -40 to +85 
NCR 52211 HR -55 to +125 


NCR 52212 Oto +70 
NCR 52212 1 -~40 to +85 
NCR 52212 HR -55 to +125 


NCR 52001 Oto +70 
NCR 52001 | -40 to +85 
NCR 52001 HR 55 to +125 






















































2K NVRAM NCR 52002 Oto +70 
NCR 52002 | -40 to +85 
NCR 52002 HR -55 to +125 





NCR 52004 
NCR 52004 | 
NCR 52004 HR 









Oto +70 
-40 to +85 
-§5 to +125 








4K NVRAM 


| a 
f 


foes] Say 
{ 


oo] py 
t | 
mas 


Design and Applications 
Assistance 


You have the option of using one of 
the Semicustom Design Centers for 
design services and suppor, or you 
may prefer to purchase a worksta- 
tion and design your device in your 
facility. In either case, NCR will pro- 
vide full engineering support. 


Options inciude: 

¢ completing design verification in 
your facility 

designing the device at the NCR 
facility or an NCR Design Center 
permitting NCR or an NCR De- 
sign Center to perform design 
verification and provide a device 
which meets your logic specifica- 
tions 


CAD Tools 


Semicustom design and development 
are done with the most sophisticated 
tools available. NCR is committed to re- 
taining the position of industry leader in 
technology, applications support and 
service. To meet this commitment, NCR 
has acquired and/or developed the 
best CAD tools that the industry can 
offer. 


Engineering Workstation Support 


NCR is a leader in the suppor of the 
most popular and powerful engineering 
workstations. Presently, you have your 
choice of the Daisy™ or Mentor 
Graphics™ Workstations, and in 1985 
the Valid™ Workstation. All of these 
workstations have powertul user- 
friendly software which is well suited to 
the design of semicustom integrated cir- 
Cuits as well as for other applications. 
NCR has ported its proprietary 
software, such as VITA™, to these work- 
stations and interfaced to the resident 
software. This means that you can per- 
form total design capture and verifica- 


tion on the workstation without the need 
for any resimulations by NCR. In addi- 
tion, NCR has developed documenta- 
tion specific to each workstation to 
guide you in the use of the NCR Semi- 
custom Design and Verification Sys- 
tem™ with the commands and proce- 
dures specific to that workstation. NCR 
Design Centers and applications engi- 
neers are available full-time to assist you 
in every phase of design and develop- 
ment, including hands-on training on a 
workstation. NCR is actively involved 
with engineering workstation industry 
leaders to continue the evolution of de- 
sign capabilities and tools. 


Timing Anatysis 


For timing analysis, NCR has devel- 
oped the VITA™ (VLSI Timing and tnter- 
connect Analysis) package of pro- 
grams. NODE DELAY and PATH 
DELAY feature user prompts and keep 
track of signal names for ease of use. 
PLUG DELAY provides feedback to 
logic simulators for "realtime" simula- 
tions. These programs can be run both 
before layout, using estimated intercon- 
nect capacitances, and after layout, us- 
ing extracted interconnect RC values, 
and rise/fall effects on cell delays. 


For analog simulations, NCR will pro- 
vide SPICE models for the cells, and full 
characterization data sheets. 


Layout 


Layout, using NCR enhancements to in- 


dustry standard auto-place-and-route 
(APR) programs, has become a stream- 
lined activity producing excellent re- 
sults. Customers have the option of hav- 
ing NCR perform the layout from a pro- 
vided netlist and specifications, or by 
obtaining an industry standard APR for 
in-house use. NCR is also cooperating 
with industry efforts to develop APR 
capability on engineering workstations. 


Tegas™ is a registered trademark of General Electric—CALMA Co. 
VITA™, SENTPEX™, Semicustom Design, and Verification System™ 


are registered trademarks of NCR Corporation 


CAL--MP™ is a registered trademark of SILVAR-LISCO. 

Daisy™ is a registered trademark of Daisy Systems, Inc. 

Mentor Graphics™ is a registered trademark of Mentor Graphics Corporation. 
Valid™ 1g a registered trademark of Valid Logic Systems, Inc. 


Test Program Generation 


NCR developed the SENTPEX™ (Sentry 
Test Pattern Extractor) package of pro- 
grams. This software checks simula- 
tions of workstations or TEGAS™ V for 
compatibility with industry standard IC 
testers, converts them to tester format 
and compresses the patterns. The re- 
sults are combined with DC parameters 
and compiled to generate the test pro- 
grams used in prototype testing and 
production testing of the device. 


CAD Software Tools 


« Schematic entry and check 
® Netlist extraction 
e Logic simulation—TEGAS™ V and/or 
workstation-based simulation, to ver- 
ify functionality and provide vectors 
for testing the device 
© Timing Analysis—The VITA™ (VLSI 
interconnect and Timing Analysis) 
package uses both estimated inter- 
connect loading and extracted inter- 
connect RC loading and rise/fall ef- 
fects to accurately model signa! 
delays. It calculates path delays as 
well as providing timing information 
to include in logic simulation. 
Automatic Piace and Route—CPR3 
and CAL-MP™ optimize placement of 
celis and automatically route the en- 
tire circuit, taking into account any 
specified critical paths. 
Layout Verification—includes com- 
parison of the netlist extracted from 
the layout to the original netlist to ver- 
ify accuracy and eliminate all poss- 
ble layout errors, ERCs (Electrical 
Rule Checks) and DRCs (Design 
Rule Checks). 
Fault Grading—verifies test pattern 
quality; performed primarily with 
TEGAS.™ 
Test Pattern Generation— 
SENTPEX™ package checks simula- 
tion pattern compatibility with testers, 
converts and compresses the pat- 
terns and compiles the test program. 


NCR Semicustom Design 


NCR Semicustom Design offers you 
the same high performance, design 
flexibility and breadth of functions as 
a fully-customized integrated circuit, 
while simultaneously minimizing 
development time and cost. Key ele- 
ments of the NCR system include 
computer-aided design (CAD) tools, 
advanced process technologies. to- 
tal technical support and a wide se- 
lection of cell functions in a 
state-of-the-art CMOS standard cell 
library. 

You can take the lead in design and 
development with NCR technical ex- 
pertise and foundry facilities to aid 
you in finding and implementing the 
optimal solution to your needs. 
Every phase of the design and 
development process is followed up 
with the NCR state-of-the-art support 
system, permitting more freedom 
and security to explore alternatives 
at minimal cost. 


e Performance—propagation delays 
less than LSTTL and HCMOS 
technologies 


e Advanced process technology— 
low power CMOS 


@ Directly TTL and HCMOS 
Compatible—no interface or pull- 
ups required 


e Sophisticated CAD System— 
minimizes risk while easing and 
speeding design providing a first 
pass working part 


¢ Optional ROM, Static RAM. and 
PLA — Customer definable in size 
and organization, with the option 
of analog and a core micropro- 
cessor on the same chip 

® Silicon Efficient—no fixed-routing 
channels or cell locations. NCR 
Semicustom Design aliows close 
packing of high-level functions for 
minimum die size and lowest 
overall cost of any semicustom 
solution 

® Many 7400/5400 equivalent func- 
tions 

e Versatile in-house assembly capa- 
bility for plastic and ceramic dual- 
in-line and chip carrier package 
types 


f 
“ 
si 
“ 
” 
4 
" 
y° 


SABER SISSD> 


—_————d 





Standard Ceti Microcomputer with Core Microprocessor, Sound Generator, 
HO and Random Cell Logic. 


Cost 


Compared to discrete logic, the use 
of an NCR cell library device to inte- 
grate system logic greatly reduces 
system power requirements, board 
space, component cost, manufac- 
turing cost, weight, and overhead 


SCHEMATIC ENTRY 


DESIGN VERIFICATION: 
SIMULATION AND 
TIMING ANALYSIS 


PATTERN CHECKS, LAYOUT CHECKS 
CONVERSION. 
COMPRESSION, 


FAULT GRADE OPTION 


AND COMPARE 


TOOLING AND 
PROTOTYPE 
FABRICATION 


COMPILE TEST 
PROGRAM 


NETLIST EXTRACT 


RULES CHECKING 


costs such as rework, inventory and 
purchasing. Reliability and perform- 
ance will also be improved. All these 
factors directly impact unit pricing, 
particularly in volume production, 
making a cell library device a man- 
datory design choice. 


DESIGN | 
PHASE “ 
2:10 
WEEKS ] 
EXTRACTED 
INFORMATION 


TOOLING 
AND 
PROTOTYPING 
4-8 WEEKS 


PROTOTYPE APPROVAL 


PRODUCTION 
RAMP-UP 
812 WEEKS 


VOLUME PRODUCTION 


NCR SEMICUSTOM DESIGN AND VERIFICATION SYSTEM 





ee aeed — 
‘ ( 


~ 


== 


| 
} 


J 


— ae 
‘ 


. 


oe ee ee ee ee 
—_= SN ESO eet Oe Cll 


~— 
SS 


a— — — 


Digital Signal 
Processing 


NCR now offers two DSP VLSI devices: 
the NCR45CG72 is the Geometric 
Arithmetic Parallel Processor chip 
(GAPP) and the NCR45CM 716 is the 
Multiplier Accumulator chip. Both of 
these devices are targeted for emerging 
digital signa! processing applications. 
The 45CM16 is aimed at 
microprocessor-based systems that 
perform multiply intensive tasks. Exam- 
ples inciude process control, robotics, 
and electronic instruments. The GAPP 
is well Suited for applications in which 
operations are repetitively applied over 
large arrays of data. This includes many 
image processing applications such as 
pattern recognition, automatic inspec- 
tion, convolution, correlation, data com- 
pression, and machine vision. 


Geometric Arithmetic 
Parallel Processor 


FEATURES 


* 6x 12 systolic array of processors in 
CMOS VLSI 

@ Highly parallel architecture 

¢ Nearest neighbor communication 
between processors 

© GAPP devices fully cascadeabie 

e Overlapped I/O and computation 

e On-chip 128-Bit SRAM per processor 


The GAPP is a revolutionary architec- 
ture that is comprised of 72 individual 
processors elements arranged in a 6 x 
12, two-dimensional array. Each of the 
processors operates in parallel with 
each processor being able to manipu- 
late different data. The massive parallel- 
ism inherent in the chip's architecture 
provides the processing power of 72 
processors on a single piece of silicon. 


Geometric Arithmetic Parallel Processor (GAPP) 


Within each processor is a bit-seria! 
ALU, 128 bits of RAM, and four single- 
bit latches. Three of these latches hold 
inputs to the ALU and the fourth jatch al- 
lows I/O operations to be performed 
without interrupting the program execu- 
tion. Thus, I/O operations can be over- 
lapped with computation. Each of these 
processors is able to communicate and 
exchange data with its four immediate 
neighbors: one to the East, West, North, 
and South. 


GAPP chips are cascadeable and allow 
system designers to implement proc- 
essor arrays of arbitrary size in multiples 
of 6 x 12 elements. For instance, two 
GAPP chips can be configured to torm 
a 12 x 12 processor array, eight chips 
can be used to form a 24 x 24 array of 
processors, and so on. The advantage 
of cascading arrays of GAPP chips in 
systems is that system throughput in- 
creases linearly with the number of 
chips used in the system. Thus, a sys- 
tem of two GAPP chips offers twice the 
processing throughput of a single 
GAPP chip, while a system of eight 
chips offers eight times the processing 
throughput of a single GAPP chip and 
four times the processing throughput of 
atwo GAPP chip system. This ability to 
trade off performance versus chip count 
offers the system designer virtually un- 
limited freedom in designing systems 
around the GAPP to meet specific per- 
formance needs. In addition, software 
compatibility can be maintained as sys- 
tem designers expand their systems by 
adding more GAPP chips to increase 
system performance. 


CE ERS es we 





The GAPP architecture is typically 
described by such terms as “systolic 
array,’ or SIMD (Singie Instruction, Multi- 
ple Data). Regardiess of how one 
describes it, the GAPP is an undeniable 
departure from the traditional 
vonNeumann architecture which pro- 
cesses data utilizing a single data ele- 
ment. The vonNeumann architecture, 
for example, depends upon component 
technology to attain processing 
throughput. The GAPP. on the other 
hand, exploits parallelism rather than re- 
lying on Component speed to achieve 
its throughput. Hence, the GAPP is able 
to achieve throughput rates unattain- 
able by vonNeumann architectures. 


NCR Semicustom Process 
Technology 


The NCR fine- geometry CMOS process 
provides excellent performance. Op- 
tions include precision capacitors for 
analog and double level metal. NCR's 


CMOS is immune to most latch-up situa- 


tions with protection of 90 mA at 12V. 
Worst case ESD (electrostatic dis- 
charge) is rated at 3.0kV. NCR's CMOS 
technoiogy has proven to be a very re- 
liable high volume process which pro- 
vides circuit densities and perform- 
ances which are extremely competitive 
in today’s market. 


Manufacturing 


Whether your semicustom design Is 
performed by NCR, a design house, or 
yourself, NCR will complete your device 
development, produce the masks and 
fabricate the waters in-house. 


Assembly 


NCR's fast-turn assembly facility permits 
short development cycles and rapid 
ramp-up for initial production. In-house 
packaging includes plastic and ceramic 
DIPs and chip carriers. Off-shore pack- 
aging capabilities offer high volume 
economies on al! packaging alterna- 
tives. 


Second Source 


NCR maintains an extensive second 
source agreement with Standard Micro- 
systems Corporation which enables 
customers to activate second-sourcing 
at any point during the design, develop- 
ment or manufacturing process. 


NCR CMOS II Digital Cell Library 


The variety of cells offered allows for op- 
timization of silicon area. A smaller die 
size means better performance and 
lower costs. 


SSI Functions: 


e Buffers and Inverters 
—drive and tristate options 

e NAND and NOR 
—available with 2,3,4 inputs 


e AND and OR—up to 8 inputs 

AO}, OAI, EXOR 

e« “Combinational” logic cells 
—for denser and faster devices 

© Delay Cells 

® Two-phase Clock Driver 


Flip-Flops/Latches: 


e Cross coupled iatches 
both NOR and NAND 
e Level sensitive transparent latches 
with Reset 
without Reset 
with clock driver 
e Fdge triggered D Flip-flops 
with Reset 
with Set and Reset 
without Set and Reset 
with clock driver, Set and Reset 
e Edge triggered JK flip-flops 
with Set and Reset 
with Set, Reset and clock driver 


MSI Functions: 


e Single-bit cascadeable loaded shift 
register with serial or parallel in, and 
serial out, with or without clock driver 


e Single-bit cascadeable, loadable, up- 
down counter with Reset and Enable, 


carry in and carry out 


Input/Output Pads and Buffers 


Options give optimal size in pad-limited 
designs. Levels are directly TTL and 
CMOS compatibie. 


e input Cells—choice of standard TTL 
or variety of Schmitt trigger levels 

e Output Cells—variety of drive op- 
tions, open drain, pullup options 

® Tristate—combination of I/O options 


CMOS II Analog Cell Library 


Op Amps 

Comparators 

Analog Switch 

Bandgap Voltage References 
Oscillators 

D/A Converters 

A/D Converters 

Flash A/D Converter 

Sound Generator 

Negative Supply Generators 
Bias Generators 

Logic Level Shifter 
Power-On-Reset 


CMOS I! Supercell Library 


¢ Modular ROM 

e Modular RAM 

e Modular PLA 

© Counter/Timer 

© 65CX02 Core-microprocessor 


Gate Array Technology 


Gate arrays are a viable option if you 
have alow volume design or one re- 
quiring fewer functions and therefore 
fewer gates. Design and development 
cycles are customarily shorter and less 
costly for gate arrays. The trade-off is in 
design flexibility and production costs, 
since a Cell library device is smaller and 
less costly in larger production quanti- 
ties. 


NCR design engineers will assist you in 
making the most cost-effective decision 
to meet your needs, whether it is a cell 
library device or a gate array. 


NCR Quality Assurance 


The NCR Microelectronics goal in all de- 


sign projects is to meet or exceed the 


customer's quality and reliability require- 


ments by building quality in. Each of 
NCR's processes and products has 
been extensively characterized and 
qualified. Design Assurance Engineers 
have worked closely with Standard Cell 
Designers and Computer-Aided Design 
Software Engineers to help assure first 
pass design success for all customers 
using Standard Cells. Each cell has 
been fully characterized and subjected 
to the same rigorous reliability testing 
used to qualify the process itself. In ad- 
dition to the initial qualification, the Qual- 
ity Assurance Department samples 
parts from each product and performs 
on-going reliability testing to maintain a 
high level of confidence in fabrication 
and assembly operations. Each part re- 
ceives full functional testing and visual 
inspection prior to shipment. 

As a result of exceedingly high stand- 
ards and the desire to be a leader, NCR 
Microelectronics has one of the jowest 
part reject records in the industry. 


— — 


i} 


= 


oe ae Te, Meee 


| 


iy; mee ao ras 


t 


— 


NCR/32 
Processor Family 


Features 

* 32-bit system architecture 

* 13.3 Megahertz frequency 

e Effective emulation of mid-range 
mainframes 

Externally microprogrammable 
Real and virtual memory operation 
Large direct memory addressing 
Interface provided to slower periph- 
erals 

® On-chip error check and correction 


Functional Description 


The NCR/32 VLSI Processor family 
combines the latest advances in semi- 
conductor technology with experi- 
ence gained in three generations of 
computer mainframe design to pro- 
vide a comprehensive microprogram- 
mable 32-bit system architecture. With 
external microprogram capability, an 
extremely flexible microinstruction set, 
and a powerful set of internal regis- 
ters, the NCR/32 offers flexibility and 
high performance advantages not 
available with other microprocessors. 


Along with an existing set of VLSI 
family support devices, the NCR/32 
offers effective emulation of register, 
stack and descriptor-based system ar- 
chitectures, as well as execution of 
high-level languages directly from mi- 
crocode. The NCR/32 is wel! suited 
for applications requiring direct ad- 
dressing of a large memory space, 
high numeric precision, and very- 
high-speed execution such as bit- 
mapped graphics, robotics, artificial 
intelligence, and relational data 
bases. 


NCR/32 
FAMILY ARCHITECTURE 


w 

nf} 
2 
wn 
~oO 
fa 
wo <f 
oat 
“e 

a 


NCR'32-010 ATC 


The NCR/32 VLSI Processor family 
consists of the Central Processor Chip 
(CPC), the Adaress Transiation Chip 
(ATC), and the System Interface Con- 
troller (SIC). Additional members of 
the family include the Extended 
Arithmetic Chip (EAC), the System In- 
terface Transmitter (SIT) and Receiver 
(SIR) chips, and the Bus Assist Chip 
(BAC). 


The CPC performs the basic micro- 
processing function using four 32-bit 
internal data paths, cormplemented by 
two independent external data paths: 
the 32-bit Processor Memory (PM) 
Bus and the 16-bit Instruction Storage 
Unit (ISU) Bus. An integral part of the 
CPC is the Arithmetic Logic Unit 
(ALU) which is used for performing 
decimal and binary arithmetic func- 
tions and logical operations. There are 
two sets of registers in the CPC. The 
Register Storage Unit consists of 16, 
32-bit registers used for storage and 





manipulation of data; the additional 22 
registers of the Internal Register Unit 
are used as jump address registers 
and operand pointer registers. A 
three-stage pipeline insures that one 
microinstruction is being fetched, 
another read, and a third executed in 
the same time frame. 


The system clock is a two-phase, 
non-overlapping clock operating at 
13.3MHz. This yields a 150 nanose- 
cond clock cycle with 90% of the mi- 
croinstructions executing in one cycle. 


GAPP Development System 


To support software development, there 
is a GAPP Evaluation Module which 
consists of a software development 
package and hardware accelerator 
board for |BM compatible personal 
computers. Development software for 
the Evaluation Module includes the 
GAPP Algorithm Language compiler, 
and a program debugger which allows 
single and multipie step execution. in 
addition, the programmer is able to ex- 
amine and change internal registers 
and RAM locations in each processor 
element. 


Also available is a GAPP Simulator/ 
Assembler which allows the program- 
mer to simulate GAPP programs on 
processor arrays of arbitrary size. The 
Simulator/Assembler allows the user to 
write and debug programs in GAPP 
micro-code, and examine internal regis- 
ters and RAM iocations, 


16 x 16 Single Port Multiptier/ 
Accumulator Chip 


FEATURES 


e 24-pin ceramic or plastic DIP 

40-bit accumulator 

190ns cycle time (typ) 

Fully static operation—no clock 

required 

¢ Single port allows easy interfaces to 
microprocessor bus 


The NCR45CM 16 is a 24-pin CMOS 
multiptier/‘accumulator chip for use with 
16-bit microprocessor systems. All input 
and output data are transferred through 
a single 16-bit bidirectional data bus in 
signed two's complement format. This 


device is TFL/CMOS compatible and re- 
quires no clock due to its totally static 
(asynchronous) operation. The 
45CM16 may be attached to a micro- 
processor bus in a way similar to a 16- 
bit wide static RAM. 


The single port design of the 45CM16 
makes it much more compact than 
three port devices. Another compara- 
tive advantage of the 45CM16 relative 
to three port multiply/accumulate chips 
is that there is no need to use a lot of 
glue logic to interface it to the micro- 
processor bus. Static operation frees 
the system designer from having to 
generate clock signals to control the 
device. These three attributes: small 
package, ease of interface to micropro- 
cessor bus, and static operation mean 
that boards designed with the 45CM16 
will be more compact and easier to de- 
sign. 

An 8086 or 68000 using the 45CM16 
can realize a 3X enhancement in overall 
multiplication speed compared with 
performing the multiplication operation 
in software using the 68000 instruction 
set. The 40-bit accumulator allows 32- 
bit partial products to be accumulated 
up to 256 times before the contents of 
the accumulator must be read. 


eens 


5 
2 ey 


a ce Ce Se ee 


oe 


NCR/32-796A 
Board 


Board Highlights 


* Dual port main memory access ca- 
pability using either Multibus or 
iLBX* 

Full 32-bit VLSI Chip Set 

—Central Processor Chip (CPC) 

~- Address Transiation Chip (ATC) 
— Extended Arithmetic Chip (EAC) 
150ns instruction/PM bus cycles 
Real and virtual memory operations 
On-board breakpoint and movable 
window trace capability 

4K words of ROM containing 
diagnostics and debug routines 
16K words of on-board RAM for 
user-defined microcode 


The NCR/32-796A board, featuring the 
NCR/32 Chip Set, provides new oppor- 
tunities for microcode generation at the 
microprocessor level. The board pro- 
vides an alternate iLBX I/O port for high- 
speed memory transfers. A wide range 
of user applications include: 

Dedicated algorithmic processing 
File processing in intelligent networks 
Graphics co-processing 

Robotics control 

Virtual machine emulation 

High-level language acceleration 
Image recognition. 


The NCR/32-796A board includes an 
instruction Storage Unit (ISU) providing 
16K words of storage for user micro- 
code. Use of the Extended Arithmetic 
Chip (EAC) offers the following math ca- 
pabilities: single and double precision 


fixed-point binary multiplication and divi- 


sion, single and double precision 
floating-point hexadecimal (IBM format), 
floating-point decimal, and format con- 
version. 


Resident microcode-development 
firmware makes breakpoint and trace 
logic readily accessible via onboard 
ROM. Additional development interface 
and assembler software is also availa- 
ble. 


*Multibus and iLBX are tradernarks of Intel Corporation. 


BUFFER 


i 


TRACE 
jee x16 
ANSFER 
(su = TS REGISTER 
BREAKPOINT 


REAO/WRITE 


J MULTIBUS INTERFACE 


BOARD BLOCK DIAGRAM 


SCRATCHPAD RAM 
428 WORD x 32 


RHADOW SS) AMS 


@. 
ADORESS 
| REGISTER 
pe 





The ATC provides memory manage- 
ment functions using either virtual or 
real memory addressing. To support 
virtual memory operations in the NCR/ 
32 chipset, an extra PM bus cycle 
precedes the standard memory ac- 
cess. Two 32-bit registers, the TOD 
Register/Counter and the Interval Ti- 
mer Monitor Register, are used for 
time interval monitoring. An NCR- 
patented ‘scrubbing’ technique 
checks, and corrects if necessary, a 
64K word block of memory every 
1.048 seconds. The ATC has three 
virtual address page sizes: 1K, 2K, 
and 4K bytes. 


The EAC is a performance booster 
used during arithmetic operations. 
Fixed point, decimai, and hexadeci- 
mal floating point formats are all 
handled by the EAC. (Hexadecimal 
floating point format is compatible with 
the 1BM/370.) Results are in either 
single (one word) or double (two 
words) precision. Conversion opera- 
tions between formats are also 
handled. 


The SIC performs communication 
management between the NCR/32 
chipset and the I/O devices. Used 
with the SIT and SIR (which perform 
data format conversions) the SIC 
sends and receives messages at up 
to 24 megabits per second per chan- 
nel. The SIC/SIT/SIR communications 
subsystem operates in either Data 
Link Control mode or Local Area 


Network mode. In the Data Link Con- 
trol mode, the SIC has access to eight 
transmission channels through a polli- 
ing scheme. This mode is designed to 
contro! multiple peripheral devices on 
a system. The Local Area Network 
mode is designed for high-speed 
transmissions in a network environ- 
ment, using two different channels of 
access. 


The NCR/32 Development System is 
available to help in evaluating the 
NCR/32 chipset and in developing 
microcode for particular system appii- 
cations. A complete development sys- 
tem consists of two NCR components, 
the NCR/32-796A Board and the 
NCR/32 Debug Monitor along with the 
following: 


¢ An|BM-compatible PC 
¢ Arelocatable, linkable assembler 
* A Multibus™ development environ- 
ment, including: 
—a chassis 
—an adapter kit consisting of a Multi- 
bus board and aPC board 
—a memory board 


in addition, experienced NCR applica- 
tions engineers can assist in determin- 
ing the suitability of the NCR/32 family 
for solving applications problems. 
These engineers can provide extensive 
training on the NCR/32 systems archi- 
tecture, individual chips, and the use of 
design support tools. 


“Multibus is a registered trademark of Intel Corp. 


sere | 


as 


a 


; ae 


| | 


| 


NCR Microprocessors 


NCR 6518 8-bit Microprocessor utilizing the 6507 CPU. 

Contains 128x8 Static RAM, two bi-directional programmable I/O ports, programmable 
interval timer. 

NCR 65C02 8-bit Microprocessor, software compatible with the NMOS 6502. 2 or 3 MHz operation, 
64K-byte addressable memory, low power consumption 4mA @ 1 MHz. 

NCR 65C21 Peripheral Interface Adapter, with two 8-bit bidirectional I/O ports, and four peripheral 
contro//interrupt input lines. 

NCR 65C22 Versatile interface adapter with internal timer/counters. Compatible with NMOS 6522. 
Two powertul 16-bit programmable internal timer/counters, Latched input/output regis- 
ters on both I/O parts. 

NCR 65CX02 Identical to 65C02 except for the addition of four bit manipulation instructions (SMB, 
RMB, BBS, BBR). Will operate at 2, 3, or 4 MHz. 

Microcomputers 


NCR 6500/1 


NCR 6500/11 


NCR 65C00/1 


NCA 65C00/2 


NCR 65C00/3 





All parts have I/O capabilities of 32 bi-directional lines, are powered by a 5V power supply, 
and are packaged in a 40 pin DIP. 


[rete | tet emmeion | ry | er rne 


16-bit 1,2,3 MHZ 
Programmable 


{2) 16-bit 1 or 2 MHz Full Duplex UART 
Programmable 10 Interrupts 


16-bit 1,2,4 MHz Low Power 
Programmable 4mA/MHz Max 
ImMA/MHz Typical 


16-bit 1,2,4 MHz Low Power 
Programmable 4mA/MHz Max 
tmMA/MHz Typical 


1&-bit 1,2,4 MHz Low Power 
Programmable 4mA/MHz Max 
1mA/MHz Typical 


Special Function Chips 


e."=———_—rKvmKlo—eoo 


scsi 
NCR 5380 


SCSI Protocol 
Controller 


NCR 5385E 
SCSI Protocol 
Controller 


NCR 5386 
SCSI Protocol 
Controller 


Supports latest ANS/ X 3T9.2 SCSI draft-proposed standard. Asynchronous data trans- 
fers to 1.5 Megabytes/sec. Operates in both initiator and target roles. Supports arbitra- 
tion including reselection. Contains on-chip open collector (48 mA at.5V) bus 
transceivers. Requires +5V supply in a 40 pin DIP 


Enhanced 5385 supports the latest ANSI X 379.2 SCS! Standard. Asynchronous data 
transfers to 1.5 Megabytes/Sec. Operates in both initiator and target roles. Supports arbi- 
tration including reselection. Uses external open collector or differential pair transceivers. 
Double buffered data registers, 24-bit transfer counter and automatic Protocol handling 
Provides high performance interface. Requires + 5V supply in a 48 pin DIP. 


Replacement for NCR 5385. Updates all SCSI timings to latest ANS! specification with 
operational enhancements. Production June '85. 


er eee 


Graphics 


NCR 7250 
CRT Controller 


NCR 7300 
Color Graphics 
Controller 


NCR 7301 
Memory 
Interface 
Controller 


On-chip character ROM with 192 characters. Addresses a 2Kx8 video RAM. Generates 
VSYNC, HSYNC and VIDEO to interface directly with CRT monitor. Eight screen and 
six field functions are under software control. Dot clocks up to 20MHz with +5V supply 
in @ 40 pin DIP. 


Translates high level commands from host computer into video operations such as 
drawing and text manipulation, and provides video output to monitor. Supports a dis- 
playable screen resolution of 640x480 pixels at 60Hz, and a frame buffer of 

1024x1024. Has analog RGB outputs, and pixel rates to 30 MHz. Interfaces to 8-bit or 
16-bit processor. Housed in 68 pin package and uses +5V supply. 


Companion chip to NCR 7300. Multiplexes and Demultiplexes between four and six- 
teen bit busses. Designed for implementation of high pertormance graphics systems 
and similar applications requiring rapid data handling. Requires +5V supply in a 28 
pin DIP. 


_-eCS— eee 


Other 


NCR 8301 
Bar Code 
Processor 


NCR 8489 
Sound 
Generator 


Decodes code 39 and interleaved 2 of 5, bidirectional decoding, velocity of 1 to 50 in/ 
sec with 32-character tag buffer. Standalone or peripheral mode with +5V supply ina 
40 Pin DIP. 


Functionally and pin compatible with SN76489A. Three programmable tone generators. 
Programable white noise generator with 4 MHz (max) clock input. Requires +5V sup- 
ply in a 16 pin DIP. 


LL 


coe 


eaten 


Se ren 


oan Oe 


bx; 


Cl ee 


Feat 


~~, ee = 
Pinger, oy Sey 
TL oe 


Te, te 


Seg og ng Oe 
Ama nF ye gah, me aren 
a ae ete. 





Eastern Area 

NCR Microelectronics Division 
400 W. Cummings Park 

Suite 2750 

Woburn, MA 01801 

Phone: (617) 933-0778 


Non-Volatile Memories 
Read-Only Memories 

NCR Microelectronics Division 
8181 Byers Road 
Miamisburg, OH 45342 
Phone: (800) 543-5618 


NCR Microelectronics 
Area Sales Offices 


Central Area Western Area 

NCR Microelectronics Division NCR Microelectronics Division 
400 Chisholm Place 4655 Old lronsides Drive 
Suite 100 Suite 400 


Plano, TX 75075 
Phone: (214) 578-9113 


Santa Ciara, CA 95050 
Phone: (408) 727-6575 


NCR Microelectronics 
Manufacturing Plants 


Semicustom Design 

Digital Signal Processing 

NCR Microelectronics Division 

2001 Danfield Court 

Fort Collins, CO 80525-2998 
Phone: (303) 226-9500 or 223-5100 


(513) 866-7471 in Ohio or 
International 
Telex: 241669 NCR NVMEM MSBG 


Telex: 45-4505 NCRMICRO FTCN 


Microprocessors/Peripherals 
NCR Microelectronics Division 
1635 Aeroplaza Drive 
Colorado Springs, CO 80916 
Phone: (800) 525-2252 
(303) 596-5611 in Colorado or 
International 
Telex: 452-457 NCR MICRO CSP 


NCR Microelectronics 





Design Centers 


Semicustom Design Center Locations 

Aptek Microsystems integrated Circuit Systems, Inc. 

700 N.W. 12th Avenue 1012 W. Ninth Avenue 

Deerfield Beach, FL 33441 King of Prussia, PA 19406 

(305) 421-8450 (215) 265-8690 

Contact: Trygve (Tryg) Ivesdal Contact: Ed Arnold or Jere Hohmann 


Custom Silicon, Inc. Ontario Centre for Microelectronics 
600 Suffolk Street 1150 Morrison Drive 

Lowell, MA 01854 Suite 400 

(617) 454-4600 Ottawa, Canada K2H9B8 

Contact: David Guinther (613) 596-6690 


ae Contact: Dr. Karl Siemens 
Design Engineering, Inc. 


1900 13th St.. Suite 304 
Boulder, CO 80302 
303/440-7997 

Contact: Steve Davis 


Manhattan Skyline, Ltd, 
United Kingdom 
Manhattan House 

Bridge Road 

Maidenhead 

Berkshire SL6 8DB 
England 

Maidenhead (0628) 75851 
Contact: Stu Kitchiner 


Array Technology 

1297 Parkmoor Avenue 
San Jose, CA 95126 
408/297-3333 

Contact: Dan Weed 


goon 


a SS 


ee a Se 


7 


INFOAHAZIOW dt GUERRA 


SU 3 
= 
EHULAZIONG | GAP IC PRECESSORE PARALLELD NCR ) 


2 SQOLVZIONI Bg 


SCHESA DI EMULAZ. + disco ASSEHBLER eee vel 


WA INSEAIRE +N UN FPLC, coneatiChne |RBY 
SOFTWARE GIRA sovt6 HSDOS. 


PREZ29 & 6°000°3I9 ( $= 4300 1) 


CONSEGNA 2 PRONTA (wane) 





= Solo SOFTWABE SIAR SUV CROSS! HLLATOR) nPe VAX 
so7ve VHS ¢ UNIX 


PREezto @ 8 2'400000 ( = 4300 L ) 


COMSEGNAS 20 -- 40 ga 


Bee ee SM, 





; ; : ; fir : P ge iat shee a a hey 
C.G. ELECTRONICS s.r.|. / f 
a | 

TELEF. 059/34 21 66 VIA P. GiARDINI, 460 - rae Onna z @ALd Cte TD. . 

Telex 226413 GIG.MO1 41100 MODENA (Italy) I, nee os es ter 
O aaede e ; 

om nt Ay } oe | re 
_ ¢ ‘ rf Og 

ALLA C.A. FHG. CHRPEHE aoe 
SEO ORR 


spate. EMEA, 
| | 
A SEGeutS VS AICHEES TA | RELATIVA A. 


BOcn MEM TTA TAO rte. Tecrticd | RA GRAB ARTE « 


GAPP PCO DbBEvE COP NENT syS tan 


GAPP  SthuUcAteR  ASSEFBLER. 


Vi IN FR MAHL. CHE THtE BbOCuUHE MTA CO. 
ne &€ GtA CBEHPRE SA th OueerM 
tre vs . PSESE 350 - 

| BATA SHEET SPECLFLCL Sore, 

mice &£5 Gob sd 

NCR AG GSE. 


MATURAL HEMT 9 6 EvERtUALE FORT TUM 
HA A COARC BO UetEKkDRE & SET GELMTH 
DO ctemME MTA TLO TEL. 


Sh C41 T é 





Fi oo Ona eI ad 


= 


Nn i ’ 








“Sf PR 6 Ottobre 1978 4. 62 Krivaiasia F 


bo. La  ACCONPAGNAMENTO BENI ere AB 6838962 186 ate 


eet es | 


RORTO A CURA VYotwdiTes LE ee TRASPORTO 


DITTA, Residenza 0 domiciio (Comune, via, n.} Codice flecale 


SKYLAB sat. 


P la Carbonari, 12 
20125 MILANO 
- Port IVA 02303480152 













MITTENTE 
OlMVLYNILS3a 





SEDE SECONDARIA O DIPENDENZA 

























































FIRMA CONOUCENTE 
FIRMA DESTINATARIO 



















= wigt Wa wy mie ¥_e nm & 3 35 5 
Sish¥ Gop Seseekge Bstbstor Se aces 33 
SESS ee eNe rage aee8s Bae H 
a3behG oe sb cdg sige sds igeRed eghusds 8 


















